专利摘要:
techniques and systems are provided to derive one or more sets of related motion parameters in a decoder. for example, the decoder can obtain video data from an encoded video data stream. video data includes at least one current figure and a reference figure. the decoder can determine the set of related motion parameters for a current block of the current picture. the set of related motion parameters can be used to perform the motion compensation prediction for the current block. the set of related motion parameters can be determined using a current related model of the current block and a related model of the reference figure. in some cases, an encoder can determine a set of related motion parameters for a current block using a current related model of the current block and a reference related model of the reference picture, and can generate a video data stream coded which includes a syntax item that indicates that the related motion derivation mode based on model compatibility must be used by a decoder for the current block. the encoded video data stream may not include any related motion parameters to determine the set of related motion parameters.
公开号:BR112019018866A2
申请号:R112019018866
申请日:2018-03-13
公开日:2020-04-14
发明作者:Chuang Hsiao-Chiang;Chen Jianle;Karczewicz Marta;Chien Wei-Jung;Li Xiang;Chen Yi-Wen;Sun Yu-Chen
申请人:Qualcomm Inc;
IPC主号:
专利说明:

DERIVATION OF MOVEMENT INFORMATION AFIM
FIELD [0001] This request refers to video encryption and compression. For example, systems and methods are described for derivating related motion.
BACKGROUND [0002] Many devices and systems allow video data to be processed and sent for consumption. Digital video data includes large amounts of data to satisfy the demands of consumers and video providers. For example, consumers of video data want video with maximum quality, with high fidelity, resolutions, frame rates, and the like. As a result, the large amount of video data that is needed to satisfy these demands places a burden on communication networks and devices that process and store video data.
[0003] Various video encryption techniques can be used to compress video data. Video encryption is performed according to one or more video encryption standards. For example, video encryption standards include high efficiency video encryption (HEVC), advanced video encryption (AVC), film expert group encryption (MPEG), or the like. Video encryption generally uses prediction methods (for example, interpredition, intraprediction, or the like) that take advantage of the redundancy present in images or video sequences. An important goal of video encryption techniques is to compress video data in a way that uses a
Petition 870190090180, of 9/11/2019, p. 8/167
2/121 bits low, while preventing or minimizing degradation to video quality. With constantly evolving video services becoming available, encryption techniques with better encryption efficiency are needed.
BRIEF SUMMARY [0004] The techniques and systems are described in this document to perform derivation of motion in order of the decoder side. Prediction based on related motion allows complex movements to be estimated, such as rotation, approach, translation, or any combination thereof, among others. In some cases, using the techniques described in this document, the related motion parameters can be determined by a video decoding device (also referred to as a decoder) for one or more blocks of video pictures without referring to motion information in order to be sent to the decoding device. For example, no related motion parameters (or the differences between related motion parameters and the predictors of related motion parameters) are flagged for such a related motion derivation mode.
[0005] The derivation of motion related to the decoder side for a current block can be based on the use of models. For example, a current affine model that includes spatially neighboring samples from a current block can be used, along with a reference affine model from a reference figure, to determine related motion parameters for the current block. For example,
Petition 870190090180, of 9/11/2019, p. 9/167
3/121 parameters of movement related to control points of the current related model can be derived by minimizing the error (or distortion) between the related prediction (associated with the pixels in the reference similar model) and reconstructed pixels of the current similar model of the current block . The related motion parameters define the related motion vectors for the control points of the current related model. The motion vectors related to the control points can then be used to determine motion vectors for pixels or sub-blocks of the current block.
[0006] According to at least one example, a method of deriving one or more sets of related motion parameters in a decoder is provided. The method comprises obtaining, through the decoder, video data from a data stream of encoded video. The video data includes at least one current figure and a reference figure. The method additionally comprises determining, through the decoder, a set of motion parameters in order for a current block of the current figuration. The set of related motion parameters is used to perform the motion compensation prediction for the current block. The set of affine movement parameters is determined using a current affine model of the current block and an affine reference model of the reference figure.
[0007] In another example, it is established that a decoder to derive one or more sets of related motion parameters includes a memory configured to store video data from an encoded video data stream and a processor. The processor is configured and
Petition 870190090180, of 9/11/2019, p. 10/167
4/121 can obtain the video data from the encoded video data stream. The video data includes at least one current figure and a reference figure. The processor is additionally configured and can determine a set of motion parameters related to a current block of the current figure. The set of related motion parameters is used to perform the motion compensation prediction for the current block. The set of related motion parameters is determined using a current related model of the current block and a related model of the reference figure.
[0008] In another example of derivation of one or more sets of related motion parameters in a decoder, a non-transitory computer-readable medium is provided having stored in it instructions that, when executed by one or more processors, cause o one or more processor: obtain, through the decoder, video data from an encoded video data stream, the video data including at least one current picture and a reference picture; and determine, through the decoder, a set of related motion parameters for a current block of the current picture, and the related set of motion parameters is used to perform the motion compensation prediction for the current block, where the set of related motion parameters is determined using a current related model of the current block and a related model of the reference figure.
[0009] In another example, a decoder
Petition 870190090180, of 9/11/2019, p. 11/167
5/121 to derive one or more sets of related motion parameters is provided. The decoder includes means for obtaining video data from an encoded video data stream. The video data includes at least one current figure and a reference figure. The decoder additionally includes means for determining a set of related motion parameters for a current block of the current picture. The set of related motion parameters is used to perform the motion compensation prediction for the current block. The set of related motion parameters is determined using a current related model of the current block and a related model of the reference figure.
[0010] In some aspects, the method, decoders and middle readable per computer described above to derive one or more sets of parameters of
Affine movement in a decoder may additionally comprise: determining motion vectors for a plurality of sub-blocks of the current block using the set of affine movement parameters determined for the current block.
[0011] In some aspects, the method, decoders and middle readable per computer described above to derive one or more sets of parameters of
related motion in a decoder can additionally comprise: determining motion vectors for a plurality of pixels in the current block using the set of related motion parameters determined for the current block.
[0012] In some respects, determining the
Petition 870190090180, of 9/11/2019, p. 12/167
6/121 set of related movement parameters for the current block includes: obtaining, through the decoder, a set of initial related movement parameters; derive, through the decoder, one or more affine motion vectors for one or more pixels in the current affine model of the current block using the set of initial affine movement parameters, and the current affine model of the current block includes neighboring reconstructed pixels the current block; determine, through the decoder, one or more pixels in the reference model of the reference figure using the one or more related motion vectors derived for the one or more pixels in the current model; minimize, through the decoder, an error between at least the one or more pixels in the current related model and the one or more pixels in the reference related model determined using one or more related motion vectors; and determine, through the decoder, the set of related motion parameters for one or more control points of the current related model based on the minimized error between at least one or more pixels in the current related model and the one or more pixels in the model related reference.
[0013] In some respects, determining the set of related motion parameters for the one or more control points of the current related model includes: determining a plurality of set of related motion parameters for the one or more control points of the model current affine with the use of at least one or more pixels in the current affine model and the one or more pixels in the reference affine model determined with the use of one or more affine motion vectors; determine a quality metric for
Petition 870190090180, of 9/11/2019, p. 13/167
7/121 each set of related motion parameters from the plurality of sets of related motion parameters; and selecting, for the one or more control points of the current related model, the set of related motion parameters from the plurality of sets of related motion parameters that have a smaller metric among the plurality of sets of related motion parameters. In some examples, the quality metric includes a sum of absolute differences (SAD).
[0014] In some respects, the set of initial affine motion parameters is determined based on a translational motion vector determined for the current block. In some cases, the translational motion vector is determined using frame rate upward conversion (FRUC) model compatibility.
[0015] In some respects, the set of initial affine motion parameters is determined based on an affine motion vector from a neighboring block to the current block.
[0016] In some respects, no related motion parameters are decoded from the encoded video data stream to determine the set of related motion parameters.
[0017] In some respects, the current related model of the current block includes one or more samples spatially neighboring the current block. In some cases, spatially neighboring samples include samples from one or more of a top neighboring block or a left neighboring block.
[0018] In some respects, the current affine model
Petition 870190090180, of 9/11/2019, p. 14/167
8/121 includes an L-shaped block. The L-shaped block includes samples from a top neighbor block of the current block and samples from a left neighbor block of the current block.
[0019] In some cases, the decoder is part of a mobile device with a display for displaying decoded video data. In some cases, the decoder is part of a mobile device with a camera to capture pictures.
[0020] According to at least one other example, a method of encoding video data is provided. The method comprises obtaining video data. The video data includes at least one current figure and a reference figure. The method additionally comprises determining a set of related motion parameters for a current block of the current figuration. The set of related motion parameters is used to perform the motion compensation prediction for the current block. The set of affine movement parameters is determined using a current affine model of the current block and an affine reference model of the reference figure. The method further comprises generating a stream of encoded video data. The encoded video data stream includes a syntax item that indicates that the relative motion derivation mode based on model compatibility must be used by a decoder for the current block. The encoded video data stream does not include any related motion parameters to determine the set of related motion parameters.
[0021] In another example, it is established that an encoder for encoding video data includes a
Petition 870190090180, of 9/11/2019, p. 15/167
9/121 memory configured to store video data and a processor. The processor is configured for and can obtain video data. The video data includes at least one current figure and a reference figure. The processor is additionally configured and can determine a set of motion parameters related to a current block of the current figure. The set of related motion parameters is used to perform the motion compensation prediction for the current block. The set of related motion parameters is determined using a current related model of the current block and a related model of the reference figure. The processor is additionally configured and can generate an encoded video data stream, the encoded video data stream includes a syntax item that indicates that the related motion derivation mode based on model compatibility must be used by a decoder for the current block, where the encoded video data stream does not include any related motion parameters to determine the set of related motion parameters.
[0022] In another example of video data encoding, a non-transitory computer-readable medium is provided having instructions stored on it that, when executed by one or more processors, causes the one or more processors to obtain the data from video, the video data including at least one current figure and a reference figure; determine a set of related motion parameters for a current block of the current picture, and the set of related motion parameters is used to perform the
Petition 870190090180, of 9/11/2019, p. 16/167
10/121 prediction of motion compensation for the current block, in which the set of related motion parameters is determined using a current related model of the current block and a related model of the reference figure; and manages an encoded video data stream, the encoded video data stream of which includes a syntax item that indicates that the relative motion derivation mode based on model compatibility must be used by a decoder for the current block, wherein the encoded video data stream does not include any related motion parameters to determine the set of related motion parameters.
[0023] In another example, an encoder to encode the video data is provided. The encoder includes means for obtaining video data. The video data includes at least one current figure and a reference figure. The encoder additionally includes means for determining a set of related motion parameters for a current block of the current picture. The set of related motion parameters is used to perform the motion compensation prediction for the current block. The set of affine movement parameters is determined using a current affine model of the current block and an affine reference model of the reference figure. The encoder further includes means for generating an encoded video data stream. The encoded video data stream includes a syntax item that indicates that the relative motion derivation mode based on model compatibility must be used by a decoder for the current block. The encoded video data stream does not include any
Petition 870190090180, of 9/11/2019, p. 17/167
11/121 related motion parameters to determine the set of related motion parameters.
[0024] In some respects, the method, encoders and computer-readable medium described above for encoding video data may further comprise: determining motion vectors for a plurality of sub-blocks of the current block using the set of motion parameters order determined for the current block.
[0025] In some respects, the method, encoders and computer-readable medium described above for encoding video data may additionally comprise: determining motion vectors for a plurality of pixels in the current block using the set of motion parameters in question. for the current block.
[0026] In some aspects, the determination of the set of related movement parameters for the current block includes: obtaining a set of initial related movement parameters; derive one or more affine motion vectors for one or more pixels in the current affine model of the current block using the set of initial affine motion parameters, where the current affine model of the current block includes reconstructed pixels neighboring the current block; determine one or more pixels in the reference model of the reference figuration using one or more derived motion vectors derived for the one or more pixels in the current model; minimize an error between at least one or more pixels in the current affine model and one or more pixels in the reference affine model determined using one or more
Petition 870190090180, of 9/11/2019, p. 18/167
12/121 related motion vectors; and determining the set of related motion parameters for one or more control points of the current related model based on the minimized error between at least one or more pixels in the current related model and the one or more pixels in the reference related model.
[0027] In some respects, determining the set of related motion parameters for the one or more control points of the current related model includes: determining a plurality of set of related motion parameters for the one or more control points of the model current affine with the use of at least one or more pixels in the current affine model and the one or more pixels in the reference affine model determined with the use of one or more affine motion vectors; determine a quality metric for each set of motion parameters in order of the plurality of sets of related motion parameters; and selecting, for the one or more control points of the current affine model, the set of motion parameters similar to the plurality of sets of related motion parameters that have a smaller metric among the plurality of sets of related motion parameters. In some examples, the quality metric includes a sum of absolute differences (SAD).
[0028] In some respects, the set of initial affine motion parameters is determined based on a translational motion vector determined for the current block. In some cases, the translational motion vector is determined using frame rate upward conversion (FRUC) model compatibility.
[0029] In some respects, the set of
Petition 870190090180, of 9/11/2019, p. 19/167
13/121 initial related motion parameters are determined based on a related motion vector from a neighboring block of the current block.
[0030] In some respects, the current related model of the current block includes one or more samples spatially neighboring the current block. In some examples, spatially neighboring samples include samples from one or more of a top neighboring block or a left neighboring block.
[0031] In some respects, the current related model includes an L-shaped block. The L-shaped block includes samples from a top neighbor block of the current block and samples from a left neighbor block of the current block.
[0032] In some respects, the method, encoders and computer-readable medium described above for encoding video data may additionally comprise: storing the encoded video data stream. In some cases, the encoder processor or an apparatus comprising the encoder is configured to store the stream of encoded video data in the encoder memory or a memory of an apparatus comprising the encoder.
[0033] In some respects, the method, encoders and computer-readable medium described above for encoding video data may further comprise: transmitting the encoded video data stream. In some cases, the encoder includes a transmitter configured to transmit the encoded video data stream. In some cases, the encoder is part of a device with a transmitter configured to
Petition 870190090180, of 9/11/2019, p. 20/167
14/121 transmit the encoder.
[0034] In some ways, the encoder is part of a mobile device with a display to display decoded video data. In some ways, the encoder is part of a mobile device with a camera to capture pictures.
[0035] This summary is not intended to identify key or essential features of the claimed matter, nor is it intended to be used in isolation to determine the scope of the claimed matter. The matter should be understood as a reference for adequate portions of the entire specification of this patent, any or all drawings and each claim.
[0036] The previously mentioned, together with other resources and modalities, will become evident with reference to the specification, the claims and attached drawings below.
BRIEF DESCRIPTION OF THE DRAWINGS [0037] Examples of various deployments are described in detail below with reference to the Figures in the following drawings:
[0038] Figure 1 is a block diagram that illustrates an example of an encoding device and a decoding device, according to some examples;
[0039] Figure 2 is a diagram that illustrates an example of a cryptographic unit (CU) structure in HEVC, according to some examples;
[0040] Figure 3 is a diagram that illustrates an example of partition modes for an interpredition mode, from
Petition 870190090180, of 9/11/2019, p. 21/167
12/151 according to some examples;
[0041] Figure 4A is a diagram that illustrates an example of a method to derive the spatial neighboring motion vector (MV) candidates to merge the interpretation mode, according to some examples;
[0042] Figure 4B is a diagram illustrating an example of a method to derive candidates from spatial neighboring MV for advanced motion vector prediction (AMVP) mode, according to some examples;
[0043] Figure 5A is a diagram illustrating an example of a block partition using a quaternary tree-binary tree (QTBT) structure, according to some examples;
[0044] Figure 5B is a diagram that illustrates a tree structure corresponding to the block partition shown in Figure 5A, according to some examples;
[0045] Figure 6 is a diagram illustrating an example of a set of encryption unit (CU) split modes available in QTBT, according to some examples;
[0046] Figure 7 is a diagram that illustrates an example of a simplified affine movement model for a current block, according to some examples;
[0047] Figure 8 is a diagram that illustrates an example of a motion vector field of sub-blocks of a block, according to some examples;
[0048] Figure 9 is a diagram that illustrates an example of motion vector prediction in inter-end mode (AF_INTER), according to some examples;
Petition 870190090180, of 9/11/2019, p. 22/167
16/121 [0049] Figure 10A and Figure 10B are diagrams that illustrate an example of motion vector prediction in affine fusion mode (AF_MERGE), according to some examples;
[0050] Figure 11A is a diagram illustrating an example of a current block and a current related model of the current block, according to some examples, [0051] Figure 11B is a diagram illustrating a current block with a related model current and a reference block of a reference figure with a similar reference model, according to some examples;
[0052] Figure 11C is a diagram that illustrates an example of a sub-block motion vector field of a block, according to some examples;
[0053] Figure 12 is a diagram that illustrates an example of movement estimate based on model compatibility for FRUC, according to some examples;
[0054] Figure 13 is a diagram that illustrates an example of motion estimation based on bilateral compatibility for upward conversion of frame rate (FRUC), according to some examples;
[0055] Figure 14 is a flow chart illustrating an example of a process to derive one or more sets of related motion parameters in a decoder, according to some examples;
[0056] Figure 15 is a flow chart illustrating an example of a process for encoding video data, according to some examples;
[0057] Figure 16 is a block diagram that
Petition 870190090180, of 9/11/2019, p. 23/167
17/121 illustrates an exemplary coding device, according to some examples; and [0058] Figure 17 is a block diagram illustrating an exemplary video decoding device, according to some examples.
DETAILED DESCRIPTION [0059] Certain aspects and deployments are provided below. Some of these aspects and implementations can be applied independently and some of them can be applied in combination as would be evident to those skilled in the art. In the following description, for the purpose of explanation, specific details are set out to provide a more complete understanding of various deployments. However, it will be evident that several deployments can be practiced without these specific details. The Figures and description are not intended to be restrictive.
[0060] The issued description provides exemplary deployments only, and is not intended to limit the scope, applicability or configuration of the disclosure. Instead, the description issued from the exemplary deployments will provide those skilled in the art with a description enabled to deploy an example. It should be understood that various changes can be made to the function and arrangement of elements without departing from the spirit and scope of application as set out in the attached claims.
[0061] Specific details are given in the description below to provide a more complete understanding of different deployments. However, it will be
Petition 870190090180, of 9/11/2019, p. 24/167
18/121 understood by an individual with common skill in the technique that implantations can be practiced without these specific details. For example, circuits, systems, networks, processes and other components can be shown as components in the form of a block diagram in order not to obscure the examples in unnecessary details. In other cases, well-known circuits, processes, algorithms, structures and techniques can be shown without unnecessary details in order to avoid obscuring the examples.
[0062] Also, note that individual deployments as a process that is portrayed as a flow chart, a flow diagram, a data flow diagram, a structure diagram, or a block diagram. Although a flowchart can describe operations as sequential processing, many operations can be performed in parallel or concurrently. In addition, the order of operations can be rearranged. A process is completed when its operations are completed, but it could have additional steps not included in a Figure. A process can correspond to a method, a function, a procedure, a subroutine, a subprogram, etc. When a process corresponds to a function, its termination can correspond to a return from the function to the calling function or the main function.
[0063] The term computer-readable medium includes, but is not limited to, portable or non-portable storage devices, optical storage devices, and various other media capable of storing, containing or carrying instruction (or instructions) and / or data. A computer-readable medium may include a non-computer
Petition 870190090180, of 9/11/2019, p. 25/167
19/121 transient in which data can be stored and which does not include carrier waves and / or transient electronic signals that propagate wirelessly or through wired connections. Examples of a non-transitory medium may include, but are not limited to, a disc or magnetic tape, optical storage media such as compact disc (CD) or digital versatile disc (DVD), flash memory, memory or memory devices. A computer-readable medium may have stored in the same code and / or machine-executable instructions that may represent a procedure, a function, a subprogram, a program, a routine, a subroutine, a module, a software package, a class, or any combination of instructions, data structures, or program statements. A code segment can be coupled to another code segment or a hardware circuit when passing and / or receiving information, data, arguments, parameters or memory content. Information, arguments, parameters, data, etc. they can be passed, forwarded or transmitted by any suitable means including memory sharing, message passing, token passing, network transmission, or the like.
[0064] Furthermore, several examples can be implemented using hardware, software, firmware, middleware, microcode, hardware description language or any combination thereof. When deployed in software, firmware, middleware or microcode, the program code or segments of code to perform the necessary tasks (for example, a computer program product) can be stored in a readable medium by
Petition 870190090180, of 9/11/2019, p. 26/167
20/121 computer or machine readable. The processor (or processors) can perform the necessary tasks.
[0065] As more devices and systems provide consumers with the ability to consume digital video data, the need for effective video encryption techniques becomes more important. Video encryption is necessary to reduce the storage and transmission requirements needed to handle large amounts of data present in digital video data. Various video encryption techniques can be used to compress video data in a way that uses a low bit rate while maintaining the high quality of the video.
[0066] Figure 1 is a block diagram illustrating an example of a video encryption system 100 that includes an encoding device 104 and a decoding device 112. The encoding device 104 may be part of a source device , and the decoding device 112 may be part of a receiving device. The source device and / or the receiving device may include an electronic device, such as a mobile or stationary phone headset (for example, smart phone, cell phone, or the like), a desktop computer, a laptop computer, or notebook, a tablet computer, a set-top box, a television, a camera, a display device, a digital media player, a video game console, a streaming video device, an Internet Protocol camera ( IP) or any other suitable electronic device. In some examples,
Petition 870190090180, of 9/11/2019, p. 27/167
21/121 the source device and the receiving device may include one or more wireless transceivers for wireless communications. The encryption techniques described in this document are applicable to video encryption in various multimedia applications, including video streams (for example, over the Internet), television broadcasts or broadcasts, digital video encoding for storage on a medium data storage, decoding of digital video stored on a data storage medium or other applications. In some instances, system 100 may support unidirectional or bidirectional video transmission to support applications such as video conferencing, video streaming, video playback, video broadcasting, games and / or video telephony.
[0067] The encoding device 104 (or encoder) can be used to encode video data using a video encryption standard or protocol to generate an encoded video bit stream. Examples of video encryption standards include ITU-T H.261, ISO / IEC MPEG-1 Visual, ITU-T H.262 or ISO / IEC MPEG-2 Visual, ITU-T H.263, ISO / IEC MPEG -4 Visual, ITU-T H.264 (also known as ISO / IEC MPEG-4 AVC), including its Scalable Video Encoding (SVC) and Multivision Video Encoding (MVC) extensions, and High Video Encoding Effectiveness (HEVC) or ITU-T H.265. Several extensions for HEVC handle multilayer video encryption exist, including scope screen content extensions, 3D video encryption (3D-HEVC) and
Petition 870190090180, of 9/11/2019, p. 28/167
22/121 multiple views (MV-HEVC) and scalable extension (SHVC). HEVC and its extensions were developed by the Joint Video Coding Collaboration Team (JCT-VC) as well as the Joint Collaboration Team in 3D Video Coding Extension Development (JCT-3V) of the Video Coding Expert Groups. Video ITU-T (VCEG) and ISO / IEC Film Specialist Group (MPEG). MPEG and ITUT VCEG also formed a joint video exploration (JVET) team to explore new encryption tools for the next generation of video encryption standards. The reference software is called JEM (joint exploration model).
[0068] Many examples described in this document provide examples that use the JEM model, the HEVC standard, and / or extensions thereof. However, the techniques and systems described in this document may also apply to other encryption standards, such as AVC, MPEG, extensions to them, or other suitable encryption standards that currently exist or future encryption standards. Consequently, although the techniques and systems described in this document may be described with reference to a specific video encryption standard, an individual of ordinary skill in the art will note that the description should not be interpreted to apply only to that specific standard.
[0069] With reference to Figure 1, a video source 102 can supply the video data to the encoding device 104. The video source 102 can be part of the source device, or it can be part of a device in addition to the source. The 102 video source can
Petition 870190090180, of 9/11/2019, p. 29/167
12/23 include a video capture device (for example, a video camera, a camera phone, a video phone, or the like), a video file that contains stored video, a video server or content provider which provides video data, a video delivery interface that receives video from a video server or content provider, a computer graphics system to generate video data from computer graphics, a combination of such sources, or any other suitable video source.
[0070] Video data from video source 102 may include one or more pictures or input frames. A picture or frame in a video is a still image of a scene. The encoding mechanism 106 (or encoder) of the encoding device 104 encodes the video data to generate an encoded video bit stream. In some examples, an encoded video bit stream (or video bit stream or bit stream) is a series of one or more encrypted video streams. An encrypted video stream (CVS) includes a series of access units (AUs) that start with an AU that has a random access point figuration on the base layer and with certain properties up to and not including a next AU that has a figuration of random access point in the base layer and with certain properties. For example, certain properties of a random access point figure that starts a CVS may include a RASL flag (for example, NoRaslOutputFlag) equal to 1. Otherwise, a random access point figure (with RASL flag) equal to 0) does not start
Petition 870190090180, of 9/11/2019, p. 30/167
24/121 a CVS. An access unit (AU) includes one or more encrypted pictures and control information that corresponds to the encrypted pictures that share the same exit time. The encrypted slices of figures are encapsulated at the bit-stream level in data units called network abstraction layer (NAL) units. For example, a HEVC video bit stream can include one or more CVSs that include NAL units. Each NAL unit has an NAL unit header. In one example, the header is one byte for H.264 / AVC (except for multilayered extensions) and two bytes for HEVC. The syntax elements in the NAL unit header capture the designated bits and are therefore visible to all types of systems and transport layers, such as Transport Flow, Real-Time Transport Protocol (RTP), File Format, among others.
[0071] Two classes of NAL units exist in the HEVC standard, including video encryption layer (VCL) NAL units and non-VCL NAL units. A VCL NAL unit includes a slice or slice segment (described below) of encrypted figuration data, and a non-VCL NAL unit includes control information that refers to one or more encrypted figures. In some cases, an NAL unit may refer to a package. An FIEVC AU includes VCL NAL units that contain encrypted figuration data and non-VCL NAL units (if any) that correspond to encrypted figuration data.
[0072] The NAL units may contain a
Petition 870190090180, of 9/11/2019, p. 31/167
25/121 bit sequence that forms an encrypted representation of the video data (for example, an encoded video bit stream, a bit stream CVS, or the like), as encrypted representations of pictures in a video. The encoding mechanism 106 generates encrypted representations of pictures by partitioning each picture into multiple slices. A slice is independent of the other slices so that the information on the slice is encrypted without relying on data from other slices in the same picture. A slice includes one or more slice segments that include an independent slice segment and, if present, one or more dependent slice segments that depend on previous slice segments. The slices are then partitioned into cryptographic tree blocks (CTBs) of luma and chroma samples. A luma sample CTB and one or more chroma sample CTBs, along with sample syntax, are referred to as a cryptographic tree unit (CTU). A CTU is the basic processing unit for HEVC coding. A CTU can be divided into multiple cryptographic units (CUs) of varying sizes. A CU contains luma and chroma sample arrangements that are referred to as cryptographic blocks (CBs).
[0073] The luma and chroma CBs can be further divided into prediction blocks (PBs). A PB is a sample block of the luma component or a chroma component that uses the same motion parameters for interpreting or predicting an intrablock copy (when available or enabled for use). The luma PB and one or more chroma PBs, together with the associated syntax, form a
Petition 870190090180, of 9/11/2019, p. 32/167
26/121 prediction unit (PU). For interpretation, a set of motion parameters (for example, one or more motion vectors, reference indexes, or the like) is signaled in the bit stream for each PU and is used for interpreting the PB luma and the one or more PBs chroma. Motion parameters can also be referred to as motion information. A CB can also be partitioned into one or more transform blocks (TBs). A TB represents a square block of samples of a color component in which the same two-dimensional transform is applied to encrypt a residual prediction signal. A transform unit (TU) represents the TBs of luma and chroma samples, which correspond to syntax elements.
[0074] A CU size corresponds to an encryption mode size and can be square in shape. For example, a CU size can be 8 x 8 samples, 16 x 16 samples, 32 x 32 samples, 64 x 64 samples, or any other suitable size up to the size of the corresponding CTU. The phrase Ν x N is used in this document to refer to the pixel dimensions of a video block in terms of vertical and horizontal dimensions (for example, 8 pixels x 8 pixels). The pixels in a block can be arranged in rows and columns. In some examples, blocks do not have the same number of pixels in a horizontal direction as in a vertical direction. Syntax data associated with a CU can describe, for example, partitioning the CU into one or more PUs. Partition modes can differ between the possibility of CU being coded in intrapredition mode or coded in mode
Petition 870190090180, of 9/11/2019, p. 33/167
12/27 Interpretation. PUs can be partitioned to be non-square in shape. Syntax data associated with a CU can also describe, for example, partitioning the CU into one or more TUs according to a CTU. A TU can be square or non-square in shape.
[0075] According to the HEVC standard, transformations can be performed using transform units (TUs). TUs can vary for different CUs. TUs can be sized based on the size of PUs within a given CU. TUs can be the same size or they can be smaller than PUs. In some examples, residual samples that correspond to a CU can be subdivided into smaller units using a quadratic tree structure known as a residual quadratic tree (RQT). The leaf nodes of the RQT can correspond to the TUs. The pixel difference values associated with the TUs can be transformed to produce transform coefficients. The transform coefficients can then be quantized by the encoding mechanism 106.
[007 6] Since the video data figures are partitioned in the CUs, the encoding mechanism 106 predicts each PU using a prediction mode. The prediction unit or prediction block is then subtracted from the original video data to obtain residuals (described below). For each CU, a prediction mode can be signaled within the bit stream using syntax data. A prediction mode can include intraprediction (or intrafiguration prediction) or interpredition (or interfiguration prediction). Intraprediction uses the correlation between spatially neighboring samples within a figuration.
Petition 870190090180, of 9/11/2019, p. 34/167
12/28
For example, with the use of intraprediction, each PU is predicted from neighboring image data in the same picture with the use, for example, the DC prediction to find an average value for the PU, flat prediction to fit a surface flat to PU, direct prediction to extrapolate from neighboring data, or any other suitable types of prediction. Interpretation uses the temporal correlation between pictures in order to derive a motion-compensated prediction for a block of image samples. For example, with the use of interpretation, each PU is predicted with the use of motion compensation prediction of image data in one or more reference figures (before or after the current figure in the output order). The decision as to the possibility of encrypting a figuration area with the use of inter-configuration or intra-configuration prediction can be made, for example, at the CU level.
[0077] In some examples, one or more slices of a picture are assigned to a type of slice. The slice types include a slice I, a slice P and a slice B. A slice I (intra-frames, independently decodable) is a slice of a figuration that is only encrypted through intraprediction and therefore is independently decodable since slice I requires only the data within the frame to predict any prediction unit or slice prediction block. A P slice (unidirectional predicted frames) is a slice of a figuration that can be encrypted with intraprediction and unidirectional interpretation. Each prediction unit or prediction block within a
Petition 870190090180, of 9/11/2019, p. 35/167
12/29 slice P is encrypted with Intraprediction or Interpredition. When interpretation is applied, the prediction unit or the prediction block is only predicted by a reference figure and, therefore, the reference samples only from a reference region of a frame. A B slice (bidirectional predictive framework) is a slice of a picture that can be encrypted with intraprediction and with interpredition (for example, be it biprediction or uniprediction). A prediction unit or prediction block of a slice B can be bidirectionally predicted from two reference figures, where each figure contributes a reference region and sample sets from the two reference regions are weighted (for example, with equal or different weights) to produce the prediction signal of the bidirectional predicted block. As explained above, a picture's slices are independently encrypted. In some cases, a picture can be encrypted as just a slice.
[0078] A PU can include data (for example, motion parameters or other suitable data) related to the prediction process. For example, when the PU is encoded using intraprediction, the PU can include data that describes an intraprediction mode for the PU. As another example, when the PU is coded using interpretation, the PU can include data that defines a motion vector for the PU. The data that defines the motion vector for a PU can describe, for example, a horizontal component of the motion vector (Δχ), a vertical component of the motion vector (Δγ), a resolution for the motion vector (for example,
Petition 870190090180, of 9/11/2019, p. 36/167
30/121 integer precision, quarter pixel precision, or eighth pixel precision), a reference figure to which the motion vector points, a reference index, a reference figure list (for example , List 0, List 1 or List C) for the motion vector, or any combination thereof.
[0079] The coding device 104 can then perform the transformation and quantization. For example, after prediction, the encoding mechanism 106 can calculate residual values that correspond to the PU. Residual values can comprise pixel difference values between the current block of pixels that is encrypted (the PU) and the prediction block used to predict the current block (for example, the predicted version of the current block). For example, after generating a prediction block (for example, which emits interpretation or intraprediction), the encoding mechanism 106 can generate a residual block by subtracting the prediction block produced by a prediction unit from the current block. The residual block includes a set of pixel difference values that quantify the differences between pixel values of the current block and pixel values of the prediction block. In some examples, the residual block can be represented in a two-dimensional block format (for example, a two-dimensional matrix or array of pixel values). In such examples, the residual block is a two-dimensional representation of the pixel values.
[0080] Any residual data that may be left over after prediction are transformed using a block transform, which can be based on a discrete cosine transform, a discrete sine transform, a
Petition 870190090180, of 9/11/2019, p. 37/167
12/31 integer transform, wavelet transform, other suitable transform function or any combination thereof. In some cases, one or more block transforms (for example, sizes 32 x 32, 16 x 16, 8x8, 4x4, or the like) can be applied to the residual data in each CU. In some examples, a TU can be used for the transform and quantization processes implemented by the encoding mechanism 106. A given CU that has one or more PUs can also include one or more TUs. As described in more detail below, residual values can be transformed into transform coefficients using block transforms and then can be quantized and scanned using TUs to produce serialized transform coefficients for entropy encryption.
[0081] In some examples, after infrapredictive or interpretive encryption using PUs from a CU, the encoding mechanism 106 can calculate the residual data for the CU's TUs. PUs can comprise pixel data in the spatial domain (or pixel domain). TUs can comprise coefficients in the transform domain after applying a block transform. As noted earlier, residual data can correspond to the pixel difference values between pixels in the uncoded figure and the prediction values that correspond to the PUs. Encoding mechanism 106 can form the TUs that include the residual data for the CU and can then transform the TUs to produce transform coefficients for the CU.
[0082] The encoding mechanism 106 can
Petition 870190090180, of 9/11/2019, p. 38/167
32/121 perform the quantization of the transform coefficients. Quantization provides more compression when quantizing the transform coefficients to reduce the amount of data used to represent the coefficients. For example, quantization can reduce the bit depth associated with some or all of the coefficients. In one example, a coefficient with a value of n-bit can be rounded to a value of m-bit during quantization, where n is greater than m.
[0083] Once quantization is performed, the bitstream of encrypted video includes quantized transform coefficients, prediction information (eg, prediction modes, motion vectors, block vectors, or the like), partition information , and any other suitable data, such as other syntax data. The different elements of the encrypted video bit stream can then be entropy encoded by the encoder mechanism 106. In some instances, the encoder mechanism 106 may use a predefined scan order to scan the quantized transform coefficients to produce a serialized vector which can be encoded by entropy. In some instances, the encoding mechanism 106 may perform an adaptive scan. After scanning the quantized transform coefficients to form a vector (e.g., a unidirectional vector), encoding mechanism 106 can entropy the vector. For example, encoder mechanism 106 can use context-adaptive variable-length encryption, context-adaptive binary arithmetic, context-adaptive binary arithmetic encryption
Petition 870190090180, of 9/11/2019, p. 39/167
33/121 based on syntax, probability interval partition entropy encryption or another suitable entropy coding technique.
[0084] As previously described, a HEVC bit stream includes a group of NAL units that include VCL NAL units and non-VCL NAL units. The VCL NAL units include encrypted figuration data that form an encrypted video bit stream. For example, a bit stream that forms the stream of encrypted video bits is resent on the VCL NAL units. Non-VCL NAL units can contain parameter sets with high level information related to the encoded video bitstream, in addition to other information. For example, a parameter set can include a video parameter set (VPS), a sequence parameter set (SPS), and a picture parameter set (PPS). Examples of objectives for parameter sets include bit rate efficiency, error resilience and providing system layer interfaces. Each slice references a unique active PPS, SPS and VPS to evaluate information that the decoding device 112 can use to decode the slice. An identifier (ID) can be encrypted for each set of parameters, including a VPS ID, an SPS ID and a PPS ID. An SPS includes an SPS ID and a VPS ID. A PPS includes a PPS ID and an SPS ID. Each slice header includes a PPS ID. Using the IDs, the active parameter sets can be identified for a given slice.
[0085] A PPS includes information that is
Petition 870190090180, of 9/11/2019, p. 40/167
34/121 apply to all in a given figuration. Because of this, all slices in a picture refer to the same PDS. The slices in different figures can also refer to the same PPS. An SPS includes information that applies to all pictures in the same encrypted video sequence (CVS) or bit stream. As previously described, an encrypted video sequence is a series of access units (AUs) that begin with a random access point (for example, an instant decode reference (IDR) figure or broken link access figure) (BLA), or other suitable random access point figuration) in the base layer and with certain properties (described above) up to and not including, a next AU that has a random access point figuration in the base layer and with certain properties (or the end of the bit stream). The information in an SPS may not change from figuration to figuration within an encrypted video stream. Pictures in an encrypted video stream can use the same SPS. The VPS includes information that applies to all layers within an encrypted video stream or bit stream. The VPS includes a syntax structure with syntax elements that apply to all encrypted video streams. In some examples, VPS, SPS or PPS can be transmitted in band with the encoded bit stream. In some examples, VPS, SPS or PPS can be transmitted out of band in a separate transmission from NAL units that contain encrypted video data.
[0086] A video bit stream can also
Petition 870190090180, of 9/11/2019, p. 41/167
35/121 include Complementary Intensification Information (SEI) messages. For example, a SEI NAL unit can be part of the video bit stream. In some examples, an SEI message may be out of the video bit stream. In some cases, an SEI message may contain information that is not required by the decryption process. For example, the information in an SEI message may not be essential for the decoder to decode the bit stream video pictures, but the decoder can use the information to improve the display or processing of the pictures (for example, the decoded output) ). The information in an SEI message can be embedded metadata. In an illustrative example, the information in a SEI message could be used by entities on the decoder side to improve the visualization of the content. In some cases, certain application standards may require the presence of such SEI messages in the bit stream so that the improvement in quality can be carried out for all devices that conform to the application standard (for example, the conduction of the message of Frame packaging SEI for stereoscopic cloth-compatible 3DTB video format frame, in which the SEI message is loaded for each video frame, handling a recovery point SEI message, use of scan rectangle SEI message pan-scan in DVB, plus many other examples).
[0087] Output 110 of the encoding device 104 can send the NAL units that constitute the encoded video data via the link
Petition 870190090180, of 9/11/2019, p. 42/167
36/121 of communications 120 to the decoding device 112 of the receiving device. The input 114 of the decoding device 112 can receive the NAL units. The communication link 120 may include a channel provided by a wireless network, a wired network, or a combination of a wired and wireless network. A wireless network can include any wireless interface or combination of wireless interfaces and can include any suitable wireless network (for example, the Internet or another wide area network, a packet-based network, WiFiTM, radio frequency (RF), UWB, WiFiDirect, mobile, Long Term Evolution (LTE), WiMaxTM, or similar). A wired network can include any wired interface (for example, fiber, Ethernet, power line Ethernet, coaxial cable Ethernet, digital signal line (DSL) or similar). Wired and / or wireless networks can be deployed using various equipment, such as base stations, routers, access points, bridges, communication ports, switches, or the like. The encoded video data can be modulated according to a communication standard, such as a wireless communication protocol, and transmitted to the receiving device.
[0088] In some examples, encoding device 104 can store encoded video data in storage 108. Output 110 can retrieve encoded video data from encoder mechanism 106 or storage 108. Storage 108 can include any one of variety of data storage media distributed or locally accessed. For example, storage 108 may include a hard disk, a disk
Petition 870190090180, of 9/11/2019, p. 43/167
37/121 storage, flash memory, volatile or non-volatile memory, or any other digital storage media to store encoded video data.
[0089] Input 114 of the decoding device 112 receives the encoded video data stream data and can supply the video bit stream data to the decoder mechanism 116 or to the storage 118 for later use through the decoder mechanism 116. The decoder mechanism 116 can decode the encoded video data stream data by means of entropy decoding (for example, with the use of an entropy decoder) and extract the elements from one or more encrypted video sequences that constitute the data encoded video. The decoder mechanism 116 can then reschedule and perform a reverse transform on the encoded video bit stream data. The residual data is then passed on to a prediction stage of decoder mechanism 116. Decoder mechanism 116 then predicts a block of pixels (e.g., a PU). In some examples, the prediction is added to the output of the inverse transform (the residual data).
[0090] The decoding device 112 may output the decoded video to a video destination device 122, which may include a display or other output device for displaying the decoded video data to a content consumer. In some respects, the video destination device 122 may be part of the receiving device that includes the decoding device 112. In some aspects, the
Petition 870190090180, of 9/11/2019, p. 44/167
38/121 video target device 122 may be part of a device separate from the receiving device.
[0091] In some examples, the video encoding device 104 and / or the video decoding device 112 can be integrated with an audio encoding device and audio decoding device, respectively. The video encoding device 104 and / or the video decoding device 112 may also include other hardware or software that is necessary to implement the encryption techniques described above, such as one or more microprocessors, digital signal processors (DSPs), application-specific integrated circuits (ASICs), field programmable port arrangements (FPGAs), discrete logic, software, hardware, firmware or any combination thereof. Video encoding device 104 and video decoding device 112 can be integrated as part of a combined encoder / decoder (codec) in a respective device. An example of specific details of the encoding device 104 is described below with reference to Figure 16. An example of specific details of the decoding device 112 is described below with reference to Figure 17.
[0092] Extensions to the HECV standard include the Multi-View Video Encoding extension, referred to as MV-HEVC, and the Scalable Video Encoding extension, referred to as SHVC. The MV-HEVC and SHVC extensions share the concept of layered encryption, where different layers are included in the flow
Petition 870190090180, of 9/11/2019, p. 45/167
39/121 bits of encoded video. Each layer in an encrypted video stream is handled by a unique layer identifier (ED). A layer ID can be present in a header of an NAL unit to identify a layer with which the NAL unit is associated. In MV-HEVC, different layers usually represent different views of the same scene in the video bit stream. In SHVC, different scalable layers are provided, which represent the video bit stream in different spatial resolutions (or figuration resolution) or in different reconstruction fidelities. Scalable layers can include a base layer (with layer ID = 0) and one or more intensification layers (with layer ID = 1, 2,... N). The base layer can fit a profile of the first version of HEVC, and represents the lowest layer available in a bit stream. The intensification layers increased the spatial resolution, temporal resolution or frame rate and / or reconstruction fidelity (or quality) compared to the base layer. The intensification layers are hierarchically organized and may (or may not) depend on the lower layers. In some examples, different layers can be encrypted using a single standard codec (for example, all layers are encoded using HEVC, SHVC, or another encryption standard). In some examples, different layers can be encrypted using a multi-standard codec. For example, a base layer can be encrypted using AVC, while one or more intensification layers can be encrypted using SHVC and / or MV-HEVC extensions for the HECV standard.
Petition 870190090180, of 9/11/2019, p. 46/167
40/121 [0093] As described above, for each block, a set of movement information (also referred to in this document as movement parameters) may be available. A set of motion information can contain motion information for forward and backward prediction directions. At present, forward and backward prediction directions are two directions of prediction in a bidirectional prediction mode and forward and backward terms do not necessarily have a geometry meaning. Instead, back and forth it can correspond to a reference figure list 0 (RefPicListO) and a reference figure list 1 (RefPicList1) of a current figure, slice or block. In some examples, when only a reference picture list is available for a picture, slice, or block, only RefPicListO is available and the movement information for each block in a slice is always forward. In some examples, RefPicListO includes reference figures that precede a current figure in time, and RefPicListl includes reference figures that follow the current figure in time. In some cases, a motion vector together with an associated reference index can be used in the decoding processes. Such a movement vector with the associated reference index is denoted as a set of unipredictive movement information.
[0094] For each prediction direction, the movement information can contain a reference index and a movement vector. In some cases, for the sake of simplicity, a motion vector may have
Petition 870190090180, of 9/11/2019, p. 47/167
41/121 associated information, of which one can assume a way that the motion vector has an associated reference index. A reference index can be used to identify a reference figure in the current reference figure list (RefPicListO or RefPicListl). A motion vector can have a horizontal and a vertical component that provides an offset from the coordinate position in the current figure to the coordinates in the reference figure identified by the reference index. For example, a reference index can indicate a specific reference figure that should be used for a block in a current figure, and the motion vector can indicate where in the reference figure the most compatible block (the block that best matches the block) current) is in the reference figure.
[0095] A picture order count (POC) can be used in video encryption standards to identify a picture display order. Although there are cases for which two pictures within an encrypted video stream can have the same POC value, within an encrypted video stream two pictures with the same POC value do not occur frequently. When multiple encrypted video streams are present in a bit stream, pictures with the same POC value may be closer together in terms of decoding order. The POC values of the figures can be used for the construction of a reference figure list, derivation of reference figure defined as in HEVC, and / or scaling of motion vector, among other things.
Petition 870190090180, of 9/11/2019, p. 48/167
42/121
In H.264 / AVC, each intermacroblock (MB) can be partitioned in four different modes, including: a 16x16 macroblock partitions; two 16x8 macroblock partitions; two 8x16 macroblock partitions; and four 8x8 macroblock partitions, among other things. Different macroblock partitions in a macroblock can have different benchmark values for each prediction direction (for example, different benchmark values for RefPicListO and RefPicListl).
[0097]
In some cases, when a macroblock is not partitioned into four 8x8 macroblock partitions, the macroblock may have only one motion vector for each macroblock partition in each prediction direction. In some cases, when a macroblock is partitioned into four 8x8 macroblock partitions, each 8x8 macroblock partition may be additionally partitioned into sub-blocks, each of which may have a different motion vector in each prediction direction. An 8x8 macroblock partition can be divided into subblocks in different ways, including: an 8x8 subblock; two 8x4 sub-blocks; two 4x8 sub-blocks; and four 4x4 sub-blocks, among others. Each sub-block can have a different motion vector in each prediction direction. Therefore, a motion vector can be present at a level equal to or greater than a sub-block.
[0098]
In HEVC the largest encryption unit in a slice is called an encryption tree block (CTB). A CTB contains a quaternary tree, the nodes of which are units of encryption.
Petition 870190090180, of 9/11/2019, p. 49/167
43/121 size of a CTB can vary from 16x16 pixels to 64x64 pixels in the main HEVC profile. In some cases, CRB sizes of 8x8 pixels may be supported. A CTB can be recursively divided into cryptographic units (CU) in a manner related to the quaternary tree, as shown in Figure 2. A CU could be the same size as a CTB and as small as 8x8 pixels. In some cases, each encryption unit is encrypted with either intraprediction or interpredition mode. When a CU is encrypted using an interpredition mode, the CU can be additionally partitioned into two or four prediction units (PUs), or it can be treated as a PU when the additional partition does not apply. When two PUs are present in a CU, the two PUs can be half rectangles or two rectangles that are 1/4 or 3/4 the size of the CU.
[0099] Figure 3 is a diagram that illustrates eight partition modes for a CU encrypted with interpredition mode. As shown, partition modes include PART_2Nx2N, PART_2NxN, PART_Nx2N, PART_NxN, PART_2NxnU, PART_2NxnD, PART_nLx2N, and PART_nRx2N. A CU can be partitioned into PUs according to the different partition modes. Consequently, a CU can be predicted using one or more of the partition modes.
[0100] When the CU is encrypted, a set of movement information can be present for each PU. In addition, each PU can be encrypted with an interpretation mode to derive the set of motion information. In some cases, when a CU is encrypted using the intraprediction mode,
Petition 870190090180, of 9/11/2019, p. 50/167
44/121 the formats of the PU can be 2Nx2N and NxN. Within each PU, a single intraprediction mode is encrypted (while the chroma prediction mode is signaled at the CU level). In some cases, NxN infra PU formats are allowed when the current CU size is equal to the smallest CU size defined in SPS.
[0101] For motion prediction in HEVC, there can be two interpretition modes for a CU or PU, including a fusion mode and an advanced motion vector prediction mode (AMVP). An intermittent mode is considered as a special case of the fusion mode. In AMVP or fusion mode, a list of motion vector (MV) candidates can be maintained for multiple motion vector (MV) predictors. The vector (or vectors) of movement, as well as reference indexes in the fusion mode, of the current PU can be generated by taking a candidate from the MV candidate list.
[0102] In some examples, the MV candidate list can contain up to five MV candidates for the merger mode and two MV candidates for the AMVP mode. In other examples, different numbers of candidates may be included in a list of MV candidates for merger mode and / or AMVP mode. A merger candidate can contain a set of motion information (for example, motion vectors that correspond to one or both of the reference picture lists (list 0 and list 1)) and the reference indices. If a fusion candidate is identified by a fusion index, reference figures can be used for the prediction of the current block. The reference figure can also be used to
Petition 870190090180, of 9/11/2019, p. 51/167
45/121 determine associated motion vectors. An AVMP candidate contains only one motion vector; then, in AVMP mode, a reference index may need to be explicitly flagged, along with an MVP index for the MV candidate list, for each potential prediction direction in list 0 or list 1. In AMVP mode, predicted motion vectors can be further refined.
[0103] As can be seen above, a merger candidate corresponds to a set of total motion information, while an AMVP candidate contains only one motion vector for a specific prediction direction and benchmark. Candidates for both fusion mode and AMVP can be derived similarly from the same spatial and / or temporal neighboring blocks.
[0104] Figure 4A and Figure 4B are diagrams that illustrate exemplary derivations of spatial neighboring MV candidates. Spatial MV candidates for a specific PU (PUO 402) can be derived from neighboring blocks, including in relation to a neighboring PU1 404, located to the right of PUO 402.
[0105] The diagram in Figure 4A illustrates the derivation of spatial MV candidates for fusion mode. In fusion mode, up to five spatial MV candidates (and, in some cases, up to four) can be derived, for example, in the following order: one candidate on the left 410 (block 0), one candidate above 412 (block 1) , a candidate on the right above 414 (block 2), a candidate on the left below 416 (block 3), and a candidate on the left above 418 (block
Petition 870190090180, of 9/11/2019, p. 52/167
46/121
4). The locations of spatial MV candidates in relation to PUO 402 are illustrated in Figure 4A. Specifically, the candidate on the left 410 is located adjacent and to the left of the lower left corner of the PUO 402; candidate above 412 is located adjacent and above the upper right corner of PUO 402; the candidate on the right above 414 is located adjacent and above the upper left corner of neighboring PU1 404; the candidate on the left below 416 is located below the candidate on the left 410; and the candidate on the left above 418 is located above and to the left of the upper left corner of PUO 402.
[0106] The diagram in Figure 4B illustrates the derivation of spatial neighboring MV candidates for AVMP mode. In AVMP mode, neighboring blocks are divided into, for example, two groups. The first group, which can be referred to as a group on the left, can include a first block 420 (block 0), located below and to the left of PUO 402, and a second block 422 (block 1), located on the left and adjacent to the lower left corner of PUO 402. The second group, which can be referred to as the group above, can include a third block 424 (block 2), located above and adjacent to the upper left corner of neighboring PU1 404, a fourth block 426 (block 3) located above and adjacent to the upper right corner of PUO 402, and a fifth block 428 (block 4), located above and to the left of the upper left corner of PUO 402. For each group, a potential MV candidate in a neighboring block which refers to the same reference figure as that indicated by the flagged reference index may have the highest priority among the blocks to be
Petition 870190090180, of 9/11/2019, p. 53/167
47/121 chosen to form a final candidate for the group. In some cases, it is possible that all neighboring blocks do not contain a motion vector pointing to the same reference figure. Therefore, if such a candidate cannot be found, the first available candidate can be staggered to form the final candidate, so that differences in temporal distance can be compensated.
[0107] In some cases, the fusion and AMVP modes may include other aspects, such as motion vector scaling, artificial motion vector candidate generation, and a pruning process for candidate insertion.
[0108] A binary quaternary tree (QTBT) has been proposed for the future video encryption standard in addition to HEVC. The simulations showed that the proposed QTBT structure can be more effective than the quaternary tree structure in the HEVC used. In the proposed QTBT structure, a CTB is first partitioned using a quaternary tree structure, where the separation of a node's quaternary tree can be iterated until the node reaches the minimum allowable quaternary tree leaf size (MinQTSize ). If the quaternary tree leaf node size is not greater than the maximum allowed binary tree root node size (MaxBTSize), it can be additionally partitioned by a binary tree. The binary tree separation of a node can be iterated until the node reaches the minimum allowed binary tree leaf node size (MinBTSize) or the maximum allowed binary tree depth (MaxBTDepth). The binary tree leaf node is called a
Petition 870190090180, of 9/11/2019, p. 54/167
48/121
CU, which can be used for prediction (for example, intraprediction or interpredition) and transformed without any additional partitions. In some cases, there are two types of separation in the binary tree separation - horizontal symmetric separation and vertical symmetric separation.
[0109] In an illustrative example of the QTBT partition structure, the CTU size can be set to 128 x 128 (luma samples and two corresponding 64 x 64 chroma samples), MinQTSize can be set to 16 x 16, the MaxBTSize can be set to 64 x 64, MinBTSize (both for width and height) can be set to 4, and MaxBTDepth can be set to 4. The quaternary tree partition is applied to CTU first to generate leaf nodes of quaternary tree. In some examples, the quaternary tree leaf nodes can be from 16 x 16 (in this case, MinQTSize) to 128 x 128 (in this case, the CTU size). If the leaf quaternary tree node is 128 x 128, it will no longer be divided by the binary tree since the size exceeds MaxBTSize (in this case, 64 x 64). Otherwise, the leaf quaternary tree node will be further partitioned by the binary tree. In this example, the leaf node of the quaternary tree is also the root node for the binary tree and has the binary tree depth as 0. When the binary tree depth reaches MaxBTDepth (4 in this example), it implies that there is no separation additional. When the binary tree node has a width equal to MinBTSize (4 in this example), this implies no more horizontal separation. Similarly, when the binary tree node has a height equal to MinBTSize (4, in that
Petition 870190090180, of 9/11/2019, p. 55/167
49/121 example), this implies no additional vertical separation. The leaf nodes of the binary tree are called CUs, and can be further processed by prediction and transformed without any additional partitions.
[0110] Figure 5A illustrates an example of block partition using QTBT, and Figure 5B illustrates the corresponding tree structure. The continuous lines shown in Figure 5A indicate the quaternary tree separation, and the dotted lines indicate the binary tree separation. At each separation node (referred to as a non-leaf node) in the binary tree, a marker can be flagged to indicate which type of separation (for example, horizontal or vertical separation) is used. In an illustrative example, a value of 0 for the marker can indicate horizontal separation and a value of 1 can indicate vertical separation. In some cases, for the quaternary tree separation, there may be no need to indicate the type of separation since a block can always be separated horizontally and vertically into four sub-blocks of equal size.
[0111] In some examples, a tree structure of multiple types can be used. For example, a tree node can be further separated with multiple tree types, such as a binary tree, a triple tree on the symmetrical center side, and a quaternary tree. The simulations showed that the tree structure of multiple types can be much more effective than the QTBT structure.
[0112] In some cases, the units in asymmetric encryption can be used on the top gives QTBT structure. Per example, four new modes in
Petition 870190090180, of 9/11/2019, p. 56/167
50/121 binary tree separation can be introduced in the QTBT frame, allowing for new separation configurations. Figure 6 is a diagram illustrating asymmetric separation modes that can be used in addition to the separation modes already available in QTBT. According to the additional asymmetric separation modes, a size 5 encryption unit is divided into two sub-CUs with sizes 5/4 and 3.5 / 4, either horizontally or vertically. On JVET-D0064, the recently added CU width or height can be just 12 or 24.
[0113] In HEVC and early video encryption standards, only one translational motion model is applied for motion compensation prediction (MCP). For example, a translational motion vector can be determined for each block (for example, each CU or PU) in a picture. However, in the real world, there are more types of movements besides translational movement, including approach (for example, approach and / or removal), rotation, perspective movements, among other irregular movements. In the Joint Exploration Model (JEM) by ITU-T VCEG and MPEG, a simplified affine transform motion compensation prediction can be applied to improve encryption efficiency. As shown in Figure 7, the motion field related to a current block 702 is described by two motion vectors ^ 0 and two control points 710 and 712. Using the motion vector of control point 710 and the vector of motion 1 from control point 712, the motion vector (MVF) field of the block
Petition 870190090180, of 9/11/2019, p. 57/167
Current 51/121 702 can be described by the following equation:
S Λ J flr w Equation (1) [0114] where V x and V y is the motion vector for each pixel in the current block 702, x and y is the position of each pixel in the current block 702 (for example, the left pixel top end in a block can have coordinate or index (x, y) = (0,0)), (v Ox , v Oy ) is the motion vector of the top left corner control point 710, w is the width of the current block 7 02, e (V lx , v ly ) is the motion vector of the top right corner control point 712. The values v Ox and v lx are horizontal values for the respective motion vectors, and values v Oy and v ly are the vertical values for the respective motion vectors. Additional control points (for example, four control points, six control points, eight control points, or some other number of control points) can be defined by adding additional control point vectors, for example, at the corners lower than the current block 702, in the center of the current block 702, or another position in the current block 702.
[0115] Equation (1) above illustrates a 4-parameter motion model, in which four related parameters a, b, c, and d are defined as:
tí · £ í ί * · i4í tv ; ed = v Oy . Using equation (1), given the motion vector (v Ox , v Oy ) of the top left corner control point 710 and the vector of
Petition 870190090180, of 9/11/2019, p. 58/167
52/121 movement (v lx , v ly ) of the top right corner control point 712, the motion vector for each pixel of the current block can be calculated using the coordinate (x, y) of each pixel location . For example, for the top left pixel position of the current block 702, the value of (x, y) can be equal to (0, 0), in which case the motion vector for the top left pixel becomes V x = v Ox and V y = v Oy .
[0116] In order to further simplify the MCP, the block-based affine transform prediction can be applied. For example, as shown in Figure 8, a current block 802 can be divided into sub-blocks. The example shown in Figure 8 includes a 4 x 4 partition, with sixteen total sub-blocks. Any suitable partition and corresponding number of sub-blocks can be used. A motion vector can then be derived for each subblock using equation (1). For example, to derive a motion vector from each of the 4 x 4 sub-blocks, the motion vector from the central sample of each sub-block (as shown in Figure 8) is calculated according to equation (1) . The resulting motion vector can be rounded, for example, to a fraction accuracy of 1/16 or other suitable precision (for example, 1/4, 1/8, or the like). Motion compensation can then be applied using the motion vectors derived from the sub-blocks to generate the prediction for each sub-block. For example, a decoding device can receive the four related parameters (a, b, c, d) that describe the motion vectors of control point 810 and the vector
Vi of movement 1 of control point 812, and can calculate
Petition 870190090180, of 9/11/2019, p. 59/167
53/121 the motion vector per sub-block according to the pixel coordinate index that describes the location of the central sample of each sub-block. After MCP, the high precision motion vector of each sub-block can be rounded, as noted above, and can be saved as the same precision as the translational motion vector.
[0117] In JEM, there are two modes of related movement: interfaith mode (AF_INTER) and related fusion mode (AF_MERGE). Figure 9 is a diagram that illustrates an example of motion vector prediction in AF_INTER mode. In some examples, when a CU has a width and height greater than 8 pixels, the AF_INTER mode can be applied. A related marker can be placed (or flagged) in the bit stream relative to a block (for example, at the CU level), to indicate whether AF_INTER mode has been applied to the block. As illustrated in the example in Figure 9, in AF_INTER mode, a candidate list of motion vector pairs can be constructed using neighboring blocks. For example, for a subblock 910, located in the upper left corner of a current block 902, a motion vector v 0 can be selected from a neighboring block A 920 above and to the left of subblock 910, of the block neighbor B 922 above sub-block 910, and neighboring block C 924 to the left of sub-block 910. As an additional example, for a sub-block 912, located in the upper right corner of the current block 902, a motion vector Vi can be selected from neighboring block D 926 and neighboring block E 928 in the directions above and to the right above, respectively. A candidate list of motion vector pairs can be constructed using the
Petition 870190090180, of 9/11/2019, p. 60/167
54/121 neighboring blocks. For example, data motion vectors
VA, VB, VC, VD, and VE what match the blocks A 920 , B 922, C 924, D 92 6, and AND 928, respectively, the list in candidates in pairs in motion vectors can to be
expressed as {(v Or v ± ) | v 0 = {V A , V B , V c }, = {V D , V E }}.
[0118] As noted above and as shown in Figure 9, in AF_INTER mode, the motion vector vq can be selected from the motion vectors of blocks A 920, B 922, or C 924. The motion vector of the neighboring block ( block A, B, or C) can be scaled according to the reference list and the relationship between the reference POC for the neighboring block, the reference POC for the current CU (for example, the current block 902), and the POC of the current CU. In these examples, some or all of the POCs can be determined from a reference list. The selection of vi from neighboring blocks D or E is similar to the selection of v 0 .
[0119] In some cases, if the number of candidate lists is less than two, the candidate list may be filled with pairs of motion vectors by duplicating each of the AMVP candidates. When the candidate list is greater than two, in some examples, the candidates on the candidate list can first be classified according to the consistency of the neighboring motion vectors (for example, consistency can be based on the similarity between the two motion vectors in a motion vector pair candidate). In such examples, the first two candidates are retained and the rest can be discarded.
[0120] In some examples, a verification of
Petition 870190090180, of 9/11/2019, p. 61/167
55/121 rate distortion (RD) cost can be used to determine which motion vector pair candidate is selected as the current CU control point motion vector (CPMVP) prediction (for example, the current block 902). In some cases, an index indicating the position of the CPMVP in the candidate list may be flagged (or otherwise indicated) in the bit stream. Once the current Affine CU CPMVP is determined (based on the motion vector pair candidate), the related motion estimate can be applied, and the control point motion vector (CPMV) can be determined. In some cases, the difference between CPMV and CPMVP can be signaled in the bit stream. Both CPMV and CPMVP include two sets of translational movement vectors, in which case the cost of signaling related movement information is greater than that of translational movement.
[0121] Figure 10A and Figure 10B illustrate an example of motion vector prediction in AF_MERGE mode. When a current block 1002 (for example, a CU) is encrypted using the AF_MERGE mode, a motion vector can be obtained from a valid neighboring reconstructed block. For example, the first block of the valid neighboring reconstructed blocks that is encrypted with affine mode can be selected as the candidate block. As shown in Figure 10A, the neighboring block can be selected from a set of neighboring blocks A 1020, B 1022, C 1024, D 1026, and E 1028. Neighboring blocks can be considered in a specific selection order to be selected as the candidate block. An example of a selection order is the neighbor to
Petition 870190090180, of 9/11/2019, p. 62/167
56/121 left (block A 1020), followed by the neighbor above (block B 1022), then the neighbor on the right above (block C 1024), then the lower neighbor on the left (block D 1026) and then the neighbor to the left above (block E 1028).
[0122] As noted above, the neighboring block that is selected can be the first block (for example, in the order of selection) that has been encrypted with the related mode. For example, block A 1020 may have been encrypted in a related manner. As shown in Figure 10B, block A 1020 can be included in a neighboring CU 1004. For neighboring CU 1004, the motion vectors for the top left corner (V2 1030), top right corner (V3 1032), and corner lower left (V4 1034) of neighboring CU 1004 may have been derived. In this example, a control point motion vector, vq 1040, for the top left corner of the current block 1002 is calculated according to V2 1030, V3 1032, and V4 1034. The control point motion vector, Vi 1042, for the top right corner of the current block 1002 can then be determined.
[0123] Since the control point motion vectors (CPMV), v 0 1040 and Vi 1042, from the current block 1002 have been derived, equation (1) can be applied to determine a motion vector field for the current block 1002. In order to identify whether current block 1002 is encrypted with AF_MERGE mode, an affine marker can be included in the bit stream when there is at least one neighboring block encrypted in affine mode.
[0124] In many cases, the affine motion estimation process includes determining the affine movement for a block on the encoder side while minimizing the
Petition 870190090180, of 9/11/2019, p. 63/167
57/121 distortion between the original block and the predicted block of related movement. As the affine movement has more parameters than the translational movement, the affine movement estimate can be more complicated than the translational movement estimate. In some cases, a method of rapid affine motion estimation based on Taylor signal expansion can be performed to determine the affine motion parameters (for example, affine motion parameters a, b, c, d in a 4-parameter model ).
[0125] Rapid affine motion estimate can include a gradient-based affine motion search. For example, given a pixel value I t at time t (where tO is the time of the reference figure), the first order Taylor expansion for pixel value I t can be determined as:
Equation (2) [0126] Where e are the pixel gradient Gox, G Oy in the x and y directions, respectively, while e indicate the motion vector components V x and V y for the pixel value I t . The motion vector for pixel I t at the current block points for a pixel ito in the reference figure.
[0127] Equation (2) can be rewritten as equation (3) as follows:
- 4ί> + 6χίΤ Equation (3 [0128] The related movement V x and V y for the value of
Petition 870190090180, of 9/11/2019, p. 64/167
58/121 pixel I t can then be solved by minimizing the distortion between the prediction and the original signal. Taking the related 4-parameter model as an example,
K - «* x - b * y 4 c ~
Equation (4)
K ssr è · x + α> y 4- d,
Equation (5) [0129] where x and y indicate the position of a pixel or sub-block. Taking equations (4) and (5) in equation (3) and then minimizing the distortion between the original signal and the prediction using equation (3), the solution of related parameters a, b, c , d can be determined:
(a, à 4} - arg afií (S j (Jf ~ $$ ~ (¾) »(a - x + b ♦ y 4 c) * (b · χ-α« yf d)) x '}
Equation (6) [0130] Any number of parameters can be used. For example, a 6-parameter related movement or other related movement can be solved in the same way as the one described above for the 4 parameter related movement model.
[0131] Once the related motion parameters are determined, which define the related motion vectors for control points, the motion vectors per pixel or sub-block can be determined using the related motion parameters ( for example, using equations (4) and (5), which are also represented in equation (1)). Equation (3) can be performed for each pixel in a current block (for example, a CU). For example, if a current tile is 16 pixels x 16 pixels, the
Petition 870190090180, of 9/11/2019, p. 65/167
59/121 minimum squares in equation (6) can then be used to derive the related motion parameters (a, b, c, d) for the current block by minimizing the overall value at 256 pixels.
[0132] Several problems arise when the related motion modeling techniques described above are used. One problem includes the high cost of signaling using the related motion model. For example, the high cost of signaling is due, at least in part, to the need for the related motion parameters to be signaled in the bit stream in order for the decoder to derive the motion vectors for the pixels or sub-blocks of the blocks in the figurations. Furthermore, the related motion derivation functions based on bilateral compatibility can be very complicated to solve, leading to the use of large amounts of processing resources.
[0133] The methods and systems are described in this document to perform the derivation of related motion on the decoder side, which addresses at least the problems noted above. Any of the techniques described in this document can be applied individually, or any suitable combination of the techniques can be applied. Using the techniques described in this document, a decoding device (also referred to as a video decoder or decoder) can determine parameters of related motion for one or more blocks of video pictures. The techniques can be performed without referring to the movement information in order to be sent to the decoding device. For example,
Petition 870190090180, of 9/11/2019, p. 66/167
60/121 related motion (or the differences between the related motion parameters and the predictors of related motion parameters) need not be signaled in the bit stream for such a related motion derivation mode to be performed by a decoding device. In some cases, the translational movement can be referred to as special affine movement.
[0134] Models can be used to derive motion in order from the decoder side to a current block. The related motion derivation using models can be referred to as the related motion derivation based on model compatibility. Affine motion derivation based on model compatibility can be used to derive affine motion information (for example, affine motion vectors) on the decoder side. For example, a current affine model may include spatially reconstructed neighboring samples (for example, pixels) from a current block, and a reference affine model of a reference figure may include samples (for example, pixels) in a reference figure that corresponds to the samples in the current related model. The current affine model and the reference affine model can be used to determine related motion parameters for the current block. The related motion parameters define the related motion vectors for the control points of the current related model. For example, the affine motion parameters (for example, the a, b, c, d parameters, which define the motion vectors) from control points of the current affine model can be derived by minimizing the error (or distortion) between the affine prediction (associated with
Petition 870190090180, of 9/11/2019, p. 67/167
61/121 samples in the reference model) and reconstructed samples of the current model in the current block. The derived affine motion parameters define the affine motion vectors for the control points. The motion vectors related to the control points can then be used to determine motion vectors for pixels or sub-blocks of the current block.
[0135] In some examples, the current affine model of a current block (for which the affine movement should be derived) is a block or a sample region of one or more neighboring blocks, with the boundaries of the current affine model that it shares one or more boundaries with the current block. In some examples, the current affine model may be the top boundary or the left boundary of the block for which the related move is to be derived. In some cases, the current affine model is in an L format. For example, the current affine model may share the top and left delimitations of the current block. In other cases, the related model can have any other suitable format. In some examples, the affine model may include pixels reconstructed in one or more reference figures of the current block (for example, the colocalized figure for the prediction of temporal MV in HEVC). In such examples, the derived affine motion vectors can be scaled according to the current figuration POC distance, the target figure of the current block, and the reference figure in which the related model is located.
[0136] Figure 11A is a diagram illustrating a current block 1102 and an example of a current related model
1104 of current block 1102. Current block 1102 can be a
Petition 870190090180, of 9/11/2019, p. 68/167
62/121 cryptography unit (CU), a prediction unit (PU), or any other suitable block of a picture. The pixels in the current affine model 1104 include pixels previously reconstructed from blocks that are neighbors to the current block 1102. In the example in Figure 11A, the current affine model 1104 is in an L-shaped pattern, which can be useful for determining points controls that can be positioned in the top left corner and the top right corner of the current related model 1104.
[0137] Figure 11B is a diagram illustrating the current block 1102 with the current affine model 1104 and a reference block 1110 of a reference figure with an affine reference model 1112. Although the reference affine model 1112 is shown in Figure 11B as having the same shape as the current like model 1104, the reference like model 1112 may not have the same shape as the current like model 1104, depending on where the reference pixels are for the pixels of the current like model 1104, given a set of related motion parameters. Control points 1106 and 1108 are defined for the current block 1102. Control point 1106 is located in the top left corner of the current affine model 1104, and control point 1108 is located in the top right corner of the current affine model 1104. As noted above, the affine motion vectors v 0 and v for control points 1106 and 1108 of the current block 1102 can be derived by minimizing the distortion between the affine prediction (which corresponds to the pixels in the reference affine model 1112) and reconstructed pixels of the current related model 1104 of the current block 1102. For example, using the pixels of the
Petition 870190090180, of 9/11/2019, p. 69/167
63/121 current affine model 1104 and pixels co-located in the reference affine model 1112, the above equations (2) - (6) can be used to iteratively solve the related motion parameters (for example, a, b, c, d) until an optimally related set of motion parameters is determined for control points 1106 and 1108 of the current block 1102.
[0138] An initial motion vector (also referred to as an initial motion vector seed or seed) is required to determine the first iteration of related motion parameters. The initial motion vector is required by the decoding device to identify the related reference model 1112. For example, the initial motion vector points to the related reference model 1112 and can then be used to identify which reference figuration, and where in this reference figure (which corresponds to the related reference model 1112), look for information necessary to derive the related movement parameters for the current block 1102. The search for the related movement parameters in reference block 1110 of the reference figure is performed around the pixel referred to by the initial motion vector.
[0139] The initial motion vector can be determined using any suitable technique. For example, a better translational motion vector can be determined for the current block 1102, and can be used as the initial motion vector to derive the related motion for the current block 1102. Note that a translational motion vector is determined for a whole block (for example, the current block 1102), while the vectors of
Petition 870190090180, of 9/11/2019, p. 70/167
64/121 related movement are determined for all pixels or for certain sub-blocks of a block. In some cases, frame rate upward conversion (FRUC) model compatibility can be performed to determine a translational motion vector for the entire current block 1102. For example, model compatibility can be used to derive motion information current block 1102 when finding the best compatibility between a model (top and / or left neighboring blocks of the current block) in the current figure and a block (for example, the same size as the model) in a reference figure. The model used for FRUC model compatibility can be the same model or it can be a different model from the current related model 1104. In an illustrative example, the current related model 1104 is an L-shape (as shown in Figure 11A-Figure 11C), while the FRUC model compatibility model may have a format like the model 1216 shown in Figure 12, which is discussed in more detail below.
[0140] The FRUC mode can be considered as a special type of fusion mode, with which the movement information of a block is not signaled, but derived on the decoder side. Two types of FRUC mode include bilateral compatibility and model compatibility. In some cases, a FRUC marker may be flagged for a block (for example, a CU or the like) when a fusion marker is true for the block. When the FRUC marker is false, a fusion index can be signaled and regular fusion mode can be used. When the FRUC marker is true, a
Petition 870190090180, of 9/11/2019, p. 71/167
65/121 Additional FRUC marker mode can be flagged to indicate which FRUC mode (for example, bilateral compatibility or model compatibility) should be used to derive translational motion information for the block.
[0141] During the translational motion derivation process, a translational initial motion vector can be derived for the entire block (for example, CU or similar) using bilateral compatibility or model compatibility. The block fusion motion vector (MV) candidate list can be checked, and the candidate motion vector from the fusion MV candidate list that leads to the minimum compatibility cost can be selected as the initial motion vector translational, and the pixel in the reference figure can be used as a starting point for a local search. For example, a local search based on bilateral compatibility or model compatibility can be performed around the starting point, and the motion vector that results in the minimum compatibility cost can be taken as the motion vector for the entire CU. Subsequently, the movement information can be further refined at the sub-block level with the derived CU motion vectors as the starting points.
[0142] As noted above, FRUC mode model compatibility can be performed to determine a translational motion vector for the current block 1102. Figure 12 illustrates an example of model compatibility. In compatibility
Petition 870190090180, of 9/11/2019, p. 72/167
66/121 model, a model 1216 can be used to derive motion information from a Reference Frame 0 1204. For example, model 1216 can include neighboring top and / or left blocks of a current 1212 block in a frame current 1202. In this example, a set of blocks can be found in Reference Frame 0 1204 which is more compatible with model 1216, where the set of blocks has the same size and / or configuration as model 1216. A vector of motion 1220 can then be determined using the location of the block set and a relative location of the current block 1212 in Reference Frame 0 1204. The relative location of the current block 1212 can be determined from an orthogonal geometric axis 1230 through, for example, the center of the current block 1212.
[0143] FRUC model compatibility can be performed for bipredict or unipredict blocks. For example, model compatibility can be performed for each reference design list independently. The model 1216 includes pixels previously reconstructed in the current figuration. The movement of the current block 1212 is determined using the neighboring pixels in the model 1216. On the decoder side, the best translational movement for the model 1216 is determined, and is used as the translational motion vector of the current block 1212. The process Search may include searching for the minimum SAD between the model of the current block 1212 and the model in the reference figure.
[0144] Another FRUC mode includes bilateral compatibility. Figure 13 illustrates an example of bilateral compatibility. In bilateral compatibility,
Petition 870190090180, of 9/11/2019, p. 73/167
67/121 the movement information for a current block 1312 in a current frame 1302 can be derived, in which the current frame 1302 is being generated for the upward conversion of frame rate. Specifically, a continuous motion path 1310 can be assumed between a first block 1314 in a first reference frame (Reference Frame 0 1304) and a second block 1316 in a second reference frame (Reference Frame 11306). A motion vector MV0 1320 in relation to Reference Frame 0 1304 can be determined for the current block 1312. For example, the position of the current block in Reference Frame 0 1304, as determined by an orthogonal geometric axis 1330 centered in the current block. 1312, can be used to determine MV0 1320. Similarly, a motion vector MV1 1322 relative to Reference Frame 11306 can be determined using the current block position in Reference Frame 11306, as given by the orthogonal geometric axis 1330. Due to the fact that the movement path 1310 is supposed to be continuous, MVO 1320 and MV1 1322 can be proportional to the time distances (TD0 1332 and TD11334, respectively) between the current frame 1302 and the two reference frames 1304 and 1306. For example , MVO 1320 can be scaled to TD0 1332, and MV1 can be scaled based on TD11334.
[0145] In some cases, TD0 1332 and TD1 1334 can be the same. In such cases, the results of bilateral compatibility may be the same as the results of the mirror-based bidirectional motion vector derivation. In some cases, bilateral compatibility can be used to determine the initial motion vector (the
Petition 870190090180, of 9/11/2019, p. 74/167
68/121 translational motion vector) for the first iteration of the related motion derivation based on model compatibility.
[0146] In an encoder, regardless of the use of FRUC mode for a CU, it can be based on a rate distortion cost selection, as is done, for example, for a normal fusion candidate. That is, a rate distortion optimization (RDO) cost can be determined for each of the two compatibility modes (for example, bilateral compatibility and model compatibility) for a given CU. The compatibility mode that has the lowest cost can be additionally compared to other CU modes. When a FRUC compatibility mode has the lowest cost, a FRUC marker can be set to indicate to a decoding device that the FRUC fusion mode should be used for a CU. In addition, the compatibility mode to be used can also be indicated in the bit stream (for example, in PPS, SPS, VPS, in an SEI message, or the like). For example, the same can be indicated in the bit stream that FRUC model compatibility should be used to determine the initial related motion vector for the first iteration of the related motion derivation based on model compatibility. The decoding device can then determine, based on the indication in the stream of
bits (for example, an variable, marker, or < another item in syntax in the PPS, SPS, VPS, in a message of SEI, or similar) .[0147] In some examples, one refining in
optical flow-based movement can follow the
Petition 870190090180, of 9/11/2019, p. 75/167
69/121 FRUC model compatibility to obtain a translational motion vector with greater accuracy. In some examples, the best translational motion vector can be used directly as the seed of the initial motion vector for the related motion derivation.
[0148] In some examples, if there are any neighboring blocks that have a related motion vector, the neighboring motion vector of a neighboring block can be used as the initial motion vector seed for the related motion derivation. For example, the related fusion mode (AF_MERGE) described above can be used to determine an initial motion vector for the related motion derivation based on model compatibility. In some cases, a distance (for example, SAD or similar) can be determined for the model compatibility translational motion vector (derived from FRUC model compatibility) and the related motion vector (of neighboring block), and the motion vector that has the shortest distance can be used. In some cases, the related motion vector of the neighboring block can be directly used as the seed of the initial motion vector.
[0149] In some cases, when an invariant operator for rotation and / or an invariant for scale is available (for example, from an upstream computer vision subsystem, from a pre-processing sub-block to amount in the same video processing pipeline, or similar), the correspondences of the key points can be used separately to derive the related parameters. In an illustrative example, in a model
Petition 870190090180, of 9/11/2019, p. 76/167
70/121 in order of 4 parameters or one of 6 parameters, two (for example, for one of 4 parameters), three (for example, for one of 6 parameters), or more corresponding key points can be found as the characteristic points Scale-Invariant Feature Transform (SIFT) in the local neighborhood or search area, and associated affine parameters can be derived with fewer numbers of iterations when taking the parameter set as a starting point. The scale parameter can be derived using two key points.
[0150] In some deployments, a related model (for example, a 4-parameter related model or a 6-parameter related model) can be determined based on previously coded information, such as block size and frame type.
[0151] The motion vector (for example, a translational motion vector determined using model compatibility or a related motion vector from a neighboring block) can then be used as the initial motion vector for the search of related movement. Turning to Figure 11B, the initial motion vector seed points to a certain pixel in the reference figure, which defines where in reference block 1110, reference model 1112 will be located for use by a decoding device. The decoding device can then use the current model 1104 and the reference model 1112 to perform the related motion derivation for the current block 1102. Once the initial motion vector seed is determined, a method based on expansion Taylor (as the method based on
Petition 870190090180, of 9/11/2019, p. 77/167
71/121 in Taylor expansion described above in relation to equations (2) - (6)) can be used to solve the affine movement based on the current affine model 1104 and its affine prediction (represented by the reference affine model 1112). In some cases, the related movement can be derived iteratively, as described below. The maximum number of iterations can be predefined or flagged. Alternatively or additionally, the number of iterations may depend on the context, such as the current related model size 1104 (or current block), the prediction direction (biprediction or uniprediction), or any other suitable factor. In some cases, an interpolation filter in addition to that used in inter-regular interpolation processes, such as bilinear interpolation filter, can be used to solve the related movement.
[0152] As noted above, once the initial motion vector is determined, equations (2) (6) described above can be used to solve a first iteration of motion parameters in the initial iteration using the vector seed of movement
As previously described, the related motion parameters can include parameters a, b, c, d defined as:
ed = v Oy .
After the first iteration is performed with a set of initial affine movement parameters (an initial set of values a, b, c, d from an initial movement model) a new set of affine movement parameters is determined by equation (6) . For example, the known values V x illiC and the motion vector seed
Petition 870190090180, of 9/11/2019, p. 78/167
72/121 initial and the known position (x, y) of the pixel or sub-block (in current block 1102) that refers to the initial motion vector seed can be used to determine the set of initial related motion parameters a, b, c, d using equations (4) - (6). When deriving the affine motion parameters in the first iteration, the initial affine motion model can be used to derive the motion per pixel for each pixel (or, in some cases, less than all pixels) in the current affine model 1104. For For example, the initial values a, b, c, d of the initial affine motion model can be entered in equations (4) and (5), or in the equivalent equation (1), to determine the motion vector (defined by V x and V y ) for each pixel (at the location (x, y)) of the current related model 1104. A reference model pixel can then be located by the motion vector determined for each pixel in the current model 1104. For example, the decoding device can locate the reference pixel for each pixel f in the current model 1104 using the related motion parameters, where; is the pixel index. Those ji corresponding reference pixels U3 in reference block 1110 form the reference model 1112. The decoding device will then have the pixels £ in the current model 1104 and the pixels in the reference model
1112, and can you calculate the horizontal gradient and the vertical gradient for each pixel in the reference model 1112. As noted above, i is the index for
Petition 870190090180, of 9/11/2019, p. 79/167
73/121 the pixels in the current related model 1104 and the related reference model 1112. Equation (6) can then be used to solve the related motion parameters (a, b, c, d) for the current block 1102 For example, the decoding device can derive the new related motion parameters using equation (6) and the known values, including the pixel and local values (x, y) for the pixels and the vertical gradient () , θ the gradient / 'í
Horizontal (-) limit (where the vertical and horizontal gradients represent the gradient around the reference pixel).
[0153]
Each iteration includes performing equations (4) - (6). For example, equations (4) and (5) can be used to find new reference pixels in the related reference model 1112. Each pixel within the current model 1104 can determine its reference pixel / £ using the motion model in this iteration.
All pixel reference pixels in the current model 1104 form the reference model 1112, in which case the related reference model 1112 may not have the same shape (for example, an L-shape) as the current related model 1104. The pixels' ε of the current related model 1104 and the έΰ pixels of the new related reference model 1112 can then be used to derive new parameters of related motion when performing equation (6).
[0154]
In an illustrative example, for each
Petition 870190090180, of 9/11/2019, p. 80/167
74/121 iteration, the motion vector per pixel (V x , V y ) of each pixel in the current affine model 1104 points to an associated interference pixel in the reference affine model 1112 (determined using equations (4) and (5) and the movement parameters related to a previous iteration). For example, a pixel k in the current related model 1104 and a pixel
Associated reference L * k Q in the reference model 1112 are referred to in this document as a pair of colocalized pixels. For each iteration, the colocalized pixel pairs and the corresponding motion vector are related to the use of equations (4) and (5) together with the related parameters updated from a previous iteration. The updated pairs of pixels / colocalized (after the new referenced pixels are found using equations (4) and (5)) can then be used to solve equation (6) again. For example, with the use of a pair of colocalized pixels (one pixel from the current related model 1104 and a corresponding pixel from the reference model 1112 located using the related motion model with the parameters derived in a previous iteration), another set of related motion parameters (for example, another set of parameters a, b, c, d) can be derived. Such an iterative process can be performed a number of times until a maximum limit (for example, a maximum of five iterations) is reached, or until all pixels in the current related model 1104 have been processed. Each iteration of equation (6) results in a motion model
Petition 870190090180, of 9/11/2019, p. 81/167
75/121 different affine that has a different set of affine movement parameters (different values of a, b, c, d for the current block 1102) that could be used as the affine movement model for the current block 1102.
[0155] The best set of motion parameters related to the iterations that were performed (for example, the five iterations or other number) can be selected as the related motion model for the current block 1102. For example, the best set of parameters range of motion can be based on a quality metric. An illustrative example of a quality metric is a sum of the absolute difference (SAD). SAD is a measure of the similarity between image blocks, and can be calculated by taking the absolute difference between each pixel in an original block (for example, the pixels in the current related model 1104) and the corresponding pixel in the block being used for comparison (for example, the pixels in the reference model 1112). The differences can be added up to create a block similarity metric. In such an example, the set of related motion parameters that results in the minimum SAD metric can be selected as the related motion model for the current block 1102. Any other metric of suitable quality can be used, including, but not limited to, a sum of absolute transformed difference (SATD), mean squared error (MSE), mean absolute error (MAE), mean absolute differences (MAD), peak to noise ratio (PSNR), among others.
[0156] The SAD metric can be defined as:
Petition 870190090180, of 9/11/2019, p. 82/167
76/121 [0157] / íf ~. η , where ν θ are the pixels (with i, j being the pixel coordinate location) that are compared in the current block (for example, current affine model 1104) and in the reference block (affine reference model 1112), respectively, and N is the size of a block x N.
[0158]
As shown in Figure 11B, control point 1106 is the top left point of the current related model 1104, and control point 1108 is the top right point of the current related model 1104. In some cases, the left points top and top right can be located at points in the current related model 1104 where the pixels are not located (for example, in a far top left corner and in a far top right corner of the 1104 model). In other cases, the points at the top left and the top right may be located at pixel locations of the current related model 1104 (for example, a top left pixel and a top right pixel of the model 1104). The set of motion parameters of the related motion model that is determined as ideal by the decoding device (for example, based on a SAD metric) defines the motion vectors v 0 and Vi for control points 1106 and 1108. The motion vectors v 0 and vi of the two control points 1106 and 1108 can then be used to derive the movement of each pixel or each sub-block within the current block 1102.
[0159]
Figure 11C is a diagram that illustrates
Petition 870190090180, of 9/11/2019, p. 83/167
77/121 a movement per sub-block determined based on the motion vectors v 0 and vi from the two control points 1106 and 1108. As shown, the current block 1102 is broken into a set of 4 x 4 sub-blocks, with sixteen total sub-blocks (for example, sub-block 1122). The motion vectors v 0 and Vi of the control points 1106 and 1108 of the current related model 1104 are used to determine the movement of each sub-block in the current block 1102. In an illustrative example, given the motion vectors vq and v 2 of the control points 1106 and 1108, the width (w) of the current block 1102, and the position (x, y) that represents a sub-block, equation (1) can be used to determine the motion vector (represented by V x , V y ) of the sub-block. In another example, given the values of a, b, c, d known from the selected related motion model and the position (x, y) that represents the sub-block, equations (4) and (5) can be used to determine the motion vector V x , V y of the subblock. In some cases, the position (x, y) in the center of the sub-block, the position (x, y) in one corner of the sub-block, or the position (x, y) somewhere else in the sub-block can be used to represent the sub-block in equation (1) or equations (4) and (5).
[0160] The offset of the current block 1102 from the current related model 1104 (as shown as 1120 in Figure 11A) can be taken into account when determining the position coordinates (x, y) used for the pixels or sub-blocks of the block current 1102. For example, if the current block 1102 is 16 pixels x 16 pixels, and the current model 1104 has four pixels in each direction (for example, four rows of pixels in the top portion of the model and
Petition 870190090180, of 9/11/2019, p. 84/167
78/121 four pixel columns in the left portion), the top left pixel in the top left sub-block of the current block 1102 can be in one location (4, 4). In such an example, the value of (4, 4) can be used as the position (x, y) in equation (1) or in equations (4) and (5) when determining the motion vector for the first sub -block (in the top left corner) of the current block 1102.
[0161] As shown in Figure 11C, after the affine movement (represented by the control point movement vectors v 0 and vi) is derived for the current block 1102 based on the affine model, the affine movement can be mapped to the translational movement for each sub-block of the current block 1102 according to the position of each sub-block. For example, after the motion vector for each sub-block is derived, that motion vector can be considered as a translational motion vector. In an illustrative example, the mapping for a 4-parameter related model is V x = a · x + b • y + ceV y = b · x— a · y + d, where x and y indicate the position of a sub-block (in the center or in the corner of the sub-block). The translational movement can be considered as the same for all pixels within a sub-block.
[0162] In some examples, the size of a sub-block and / or the number of sub-blocks in a current block can be predefined. For example, the size of the sub-blocks in the current block 1102 can be predefined as 4 pixels x 4 pixels, or another suitable size. In some examples, the size of the sub-blocks and / or the number of sub-blocks in a current block may be flagged or otherwise included in the bit stream (for example, in PPS, SPS, VPS, in a
Petition 870190090180, of 9/11/2019, p. 85/167
79/121 SEI message, or similar). In some examples, the size of a subblock can be adaptively changed based on the current block size. In some examples, the size of a sub-block may be the same as that defined in the FRUC mode.
[0163] In some deployments, to reduce complexity, only partial sets of pixels in the current affine model 1104 are used to derive the affine movement for a current block. The size (for example, number of top bounding rows and number of left bounding columns) of the current related model can be signaled in the bit stream (for example, in one or more sets of parameters, such as PPS, SPS , or VPS) or can be predefined. Any number of predefined pixels can be included in the current related model (and in the related reference model). In an illustrative example, a 4-pixel related model can be used, in which case for an L-shaped model (for example, the current related model 1104), the related model can include four rows of pixels at the top of the model and four pixel columns in the left portion of the model.
[0164] In some examples, affine motion vectors can be derived by minimizing the weighted error (or distortion) between the affine prediction and reconstructed pixels of the current affine model of the current block. For example, affine prediction outliers and reconstructed pixels from the affine model can be removed or multiplied by different weights during derivation. Such removal of outlier can improve the derivation of motion vector stability. In an illustrative example, the
Petition 870190090180, of 9/11/2019, p. 86/167
80/121 decoding device can derive the related motion vectors by minimizing the distortion between related prediction and reconstructed pixels from the current related model of the current block. Based on the derived motion vectors, the decoding device can calculate the distortion value for each pixel. According to the distortion value, the decoder can assign different weights to the pixels and can then derive motion vectors again by minimizing the weighted distortion between the related prediction and reconstructed pixels of the related model of the current block.
[0165] In some examples, a filtration process (for example, a low-pass filter, or another suitable filter) can be applied to the current related model and / or its related prediction (including the reference related model) to improve the bypass stability.
[0166] In some examples, for the biprediction or prediction of multiple hypotheses, the related movement can be derived for each hypothesis separately or together. In some cases, when the related movement is derived separately, an independent model can be used for each hypothesis. For example, in the case of two hypotheses, two independent models TO and TI can be used. Based on the two models TO and Tl, MVO and MV1 can be derived. In some cases, when the related movement is derived together, the model can be updated based on an MV that is already derived. For example, in the case of two hypotheses, when the second MV is derived, the model Tl can be updated as T, so that Tl '= (2 * Tl Pred (MVO)), where Pred (MVO) represents the prediction with MVO movement. An iterative affine motion derivation
Petition 870190090180, of 9/11/2019, p. 87/167
81/121 can also be allowed in the joint derivation.
[0167] The related motion derivation mode based on model compatibility can be flagged (for example, in PPS, SPS, VPS, in an SEI message, or similar) as an independent interpretation mode with a marker or other item of syntax. A syntax item can include a variable, a bookmark, a syntax element, a syntax structure, or other suitable part of a syntax included in a PPS, SPS, VPS, SEI message, or the like. In some cases, the related motion derivation mode based on model compatibility can be signaled as a special FRUC mode. In some examples, the affine motion derivation mode based on model compatibility can be flagged and / or used only when the affine model of the current block is available. For example, in some cases, the L-shaped model (or another appropriately shaped model) can be considered as available only when both the top and left reconstructed blocks are available. In some cases, when flagged as a special FRUC mode, the binarization illustrated in Table 1 below can be used when all FRUC modes are available for selection:
Binarization0 FRUC inactive 11 bilateral FRUC compatibility 101 related FRUC model (similar motion derivation based on model compatibility) 100 FRUC model compatibility
TABLE 1
Petition 870190090180, of 9/11/2019, p. 88/167
82/121 [0168] In an illustrative example, the context of the third bin (in relation to the related motion derivation based on model compatibility) from Table 1 above can be set to 0 if none of the neighbors above or to the left in a mode affine (AF_MERGE mode, AF_INTER mode, FRUC TEMPLATE AFFINE mode), 1 if neighbors above or left are in affine mode, and 2 if both neighbors above and left are in affine mode. The affine mode in this document includes, but is not limited to, regular inter-affinity mode, affine fusion mode, and affine model (affine derivation of motion based on model compatibility).
[0169] In some examples, the related motion (v 0 and vi) derived using the techniques described above can be used as the motion vector predictor (MVP) for conventional inter-affinity modes (for example, AF_INTER mode or AF_MERGE). For example, for conventional like modes, at least one like motion predictor can be derived on the decoder side in the same way as described above. In some cases, a 4-parameter related model is used to derive the related motion predictor when the block is signaled to use the 4 parameter related model. For example, in interoffice mode (AF_INTER), a motion vector difference (MVD) can be signaled to the decoding device (for example,
example, in a PPS, SPS, VPS, message in KNOW, or similar). MVD can include a difference in between one predictor (for example, a vector of movement in blocks THE, B, or C used as a predictor for sub-block 910 in the figure
9) and a control point motion vector (for example,
Petition 870190090180, of 9/11/2019, p. 89/167
83/121 example, the motion vector of sub-block 910). The MVD can then be added to a motion vector predictor (MVP) by a decoding device to determine the control point motion vectors v 0 and Vi. The related model can be used to generate the MVP. For example, the decoding device can derive a, b, c, d using the current related model and the reference related model, as described above (for example, using the set of ideal related motion parameters). The motion parameters a, b, c, d define the motion vector of the control points vo and vi. These derived motion vectors can be used as the MVP for subblocks 10 and 12. For example, the motion vector Vo can be used as the MVP for sub-block 910, and the motion vector Vi can be used as the MVP. for sub-block 12. MVPs can then be added to the corresponding MVD.
[0170] In some examples, the derivation of related motion information on the decoder side can be directly performed on the reconstructed pixel blocks. In one example, after the figure is reconstructed (for example, after a loop filter), the figure is divided into blocks and the related motion derivation based on model compatibility, as described above, is then applied to each block. to derive the related movement. The derived motion information can then be used for motion vector prediction.
[0171] In some examples, to reduce complexity, some cryptographic tools can be restricted when the affine mode model is used. Such
Petition 870190090180, of 9/11/2019, p. 90/167
84/121 restrictions can be predefined or signaled in bit streams. In an illustrative example, BIO may not be applied to a block when the derivation of motion in order to model compatibility is used for the block. In another illustrative example, illumination compensation (IC) may not be applied to a block when the related motion derivation based on model compatibility is used for the block.
[0172] Figure 14 is a flow chart that illustrates an example of a 1400 process for deriving one or more sets of related motion parameters in a decoder using the techniques described in this document. In block 1402, process 1400 includes obtaining, through the decoder, video data from an encoded video data stream. The video data includes at least one current figure and a reference figure. The current figure includes a figure that is currently being decoded. In some examples, the reference figure can be identified using a reference figure list or index (for example, a reference figure list 0 (RefPicList0)). In some cases, multiple reference figures can be accessed for the current figure, in which case the 1400 process can be performed using more than one reference figure. For example, a picture list of
reference 0 ( RefPicList0) and a list of figuration of reference 1 (RefPicList1 ) may indicate what two figurations in reference < are associated figuration current.; 0173 ] On the block 1404, the process 1400 includes
Petition 870190090180, of 9/11/2019, p. 91/167
85/121 determine, through the decoder, a set of motion parameters in order for a current block of the current figuration. The set of related motion parameters is used to perform the motion compensation prediction for the current block. The set of related motion parameters is determined using a current related model of the current block and a related model of the reference figure. Using the model-based approach, the set of affine parameters can be determined through the decoder using the process on the decoder side without using any affine motion signaled in the bit stream. For example, no related motion parameters are decoded from the encoded video data stream to determine the set of related motion parameters. In some cases, the related motion parameters are not included in the bit stream.
[0174] In some cases, process 1400 may determine the set of related motion parameters by obtaining, through the decoder, a set of initial related motion parameters. The set of initial related motion parameters can be determined using any suitable technique. In an illustrative example, the set of initial affine motion parameters can be determined based on a translational motion vector determined for the current block. In some cases, the translational motion vector can be determined using any suitable technique, such as a frame rate upward conversion model (FRUC) compatibility mode or another suitable technique. In another illustrative example, the set of initial affine motion parameters can be
Petition 870190090180, of 9/11/2019, p. 92/167
86/121 determined based on a motion vector related to a neighboring block of the current block. For example, the related motion vector of a neighboring block can be used as the seed of the initial motion vector for the derivation of motion related to the current block. In an illustrative example, the affine fusion mode (AF_MERGE) described above can be used to determine the affine motion vector that can be used as the initial motion vector.
[0175] Process 1400 can determine the set of related motion parameters by further deriving, through the decoder, one or more related motion vectors for one or more pixels in the current related model of the current block using the set of related parameters. initial related movement. The current affine model of the current block includes reconstructed pixels neighboring the current block. An example of the current related model is shown in Figure 11A, Figure 11B and Figure 11C. Process 1400 can then determine, through the decoder, one or more pixels in the reference model of the reference figure using the one or more related motion vectors derived for the one or more pixels in the current model. The process 1400 can additionally minimize, through the decoder, an error between at least one or more pixels in the current related model and the one or more pixels in the reference related model determined using one or more related motion vectors. Process 1400 can then determine, through the decoder, the set of related motion parameters for one or more control points of the current related model based on the minimized error between at least one or more pixels in the current related model and
Petition 870190090180, of 9/11/2019, p. 93/167
87/121 o one or more pixels in the related reference model. Such a process to determine the set of related motion parameters can be performed, for example, using equations (3) - (6).
[0176] In some examples, process 1400 can determine the set of related motion parameters for the one or more control points of the current related model by determining a plurality of sets of related motion parameters for the one or more control points of the current affine model with the use of at least one or more pixels in the current affine model and the one or more pixels in the reference affine model determined using one or more affine motion vectors. For example, equations (4) - (6) can be performed iteratively, as described above, to determine multiple sets of related motion parameters. The process 1400 can determine a quality metric for each set of related motion parameters from the plurality of sets of related motion parameters. In some examples, the quality metric includes a sum of absolute differences (SAD). The process 1400 can then select, for the one or more control points of the current related model, the set of motion parameters in order to the plurality of sets of related motion parameters that have the smallest metric among the plurality of sets of parameters of related movement. As shown in Figure 11A, Figure 11B, and Figure 11C, two control points can be defined for the current block.
[0177] Process 1400 can determine motion vectors for one or more samples of the current block with
Petition 870190090180, of 9/11/2019, p. 94/167
88/121 base on the set of motion parameters determined for the one or more control points of the current related model. For example, process 1400 can determine motion vectors for a plurality of sub-blocks of the current block using the set of related motion parameters for the current block. An example of sub-blocks of a current block are shown in Figure 11C. In some examples, instead of determining motion vectors for sub-blocks, process 1400 can determine motion vectors for a plurality of pixels in the current block using the set of related motion parameters for the current block.
[0178] In some examples, the current related model of the current block includes one or more samples spatially neighboring the current block. In some cases, spatially neighboring samples include samples from one or more of a top neighboring block or a left neighboring block. For example, the example shown in Figure 11A includes a current affine model 1104 that includes samples from a top neighboring block (a neighboring block to the top of the current block) and samples from a left neighboring block (a neighboring block to the left of the current block). In some examples, the current related model includes an L-shaped block. The L-shaped block can include samples from a top neighbor block of the current block and samples from a left neighbor block of the current block (as shown in Figure 11A ). In other examples, the current affine model may include samples from a right neighboring block and / or a right neighboring block.
[0179] Figure 15 is a flow chart that illustrates
Petition 870190090180, of 9/11/2019, p. 95/167
89/121 is an example of a process 1500 for encoding video data using the techniques described in this document. In block 1502, process 1500 includes obtaining video data. The video data includes at least one current figure and a reference figure. The current picture includes a picture that is currently being encoded (or decoded in an encoder reverse circuit). In some examples, the reference figure can be identified using a reference figure list or index (for example, a reference figure list 0 (RefPicList0)). In some cases, multiple reference figures can be used to encrypt the current figure, in which case process 1500 can be used.
performed with the use of more in a figuration of reference. For example, a list in figuration of reference 0 (RefPicList0) and a list in figuration of reference 1 (RefPicList1) can indicate what two figurations in
reference are associated with the current figuration.
[0180] In block 1504, process 1500 includes determining a set of motion parameters related to a current block of the current figuration. The set of related motion parameters is used to perform the motion compensation prediction for the current block. The set of affine movement parameters is determined using a current affine model of the current block and an affine reference model of the reference figure.
[0181] In block 1506, process 1500 includes generating a stream of encoded video data. The encoded video data stream includes a syntax item that indicates the related motion derivation mode based on
Petition 870190090180, of 9/11/2019, p. 96/167
90/121 model compatibility must be used by a decoder for the current block. The syntax item can include a syntax element, syntax structure, variable, bookmark, or the like, and can be included in a PPS, SPS, VPS, SEI message, or other part of the flow encoded video data. The encoded video data stream does not include any related motion parameters to determine the set of related motion parameters. For example, using the model-based approach, the set of related parameters can be determined via the decoder using a process on the decoder side without using any signaled related movement in the encoded video data stream. For example, no related motion parameters are decoded from the encoded video data stream to determine the set of related motion parameters.
[0182] In some cases, process 1500 can determine the set of related motion parameters by obtaining an initial set of related motion parameters. The set of initial related motion parameters can be determined using any suitable technique. In an illustrative example, the set of initial affine motion parameters can be determined based on a translational motion vector determined for the current block. In some cases, the translational motion vector can be determined using any suitable technique, such as a frame rate upward conversion model (FRUC) compatibility mode or another suitable technique. In another illustrative example, the set of
Petition 870190090180, of 9/11/2019, p. 97/167
91/121 initial affine movement can be determined based on a related movement vector of a neighboring block of the current block. For example, the related motion vector of a neighboring block can be used as the seed of the initial motion vector for the derivation of motion related to the current block. In an illustrative example, the affine fusion mode (AF_MERGE) described above can be used to determine the affine motion vector that can be used as the initial motion vector.
[0183] Process 1500 can determine the set of related motion parameters by additionally deriving one or more related motion vectors for one or more pixels in the current related model of the current block using the set of initial related motion parameters. The current affine model of the current block includes reconstructed pixels neighboring the current block. An example of the current related model is shown in Figure 11A, Figure 11B and Figure 11C. Process 1500 can then determine one or more pixels in the reference affine model of the reference figuration using one or more related motion vectors derived for the one or more pixels in the current affine model. Process 1500 can additionally minimize an error between at least one or more pixels in the current related model and the one or more pixels in the reference related model determined using one or more related motion vectors. Process 1500 can then determine the set of related motion parameters for one or more control points of the current related model based on the minimized error between at least one or more pixels in the current related model and the one or more pixels in the model related reference. Such
Petition 870190090180, of 9/11/2019, p. 98/167
92/121 process to determine the set of parameters of related movement can be performed, for example, using equations (3) - (6).
[0184] In some examples, process 1500 can determine the set of related motion parameters for the one or more control points of the current related model by determining a plurality of sets of related motion parameters for the one or more control points of the current affine model with the use of at least one or more pixels in the current affine model and the one or more pixels in the reference affine model determined using one or more affine motion vectors. For example, equations (4) - (6) can be performed iteratively, as described above, to determine multiple sets of related motion parameters. Process 1500 can determine a quality metric for each set of related motion parameters from the plurality of sets of related motion parameters. In some examples, the quality metric includes a sum of absolute differences (SAD). Process 1500 can then select, for the one or more control points of the current related model, the set of motion parameters related to the plurality of sets of related motion parameters that have the smallest metric among the plurality of parameter sets of related movement. As shown in Figure 11A, Figure 11B and Figure 11C, two control points can be defined for the current block.
[0185] Process 1500 can determine motion vectors for one or more samples of the current block based on the set of motion parameters determined
Petition 870190090180, of 9/11/2019, p. 99/167
93/121 for the one or more control points of the current related model. For example, process 1500 can determine motion vectors for a plurality of sub-blocks of the current block using the set of related motion parameters for the current block. An example of sub-blocks of a current block are shown in Figure 11C. In some examples, instead of determining motion vectors for sub-blocks, process 1500 can determine motion vectors for a plurality of pixels in the current block using the set of related motion parameters for the current block.
[0186] In some examples, the current affine model of the current block includes one or more samples spatially neighboring the current block. In some cases, spatially neighboring samples include samples from one or more of a top neighboring block or a left neighboring block. For example, the example shown in Figure 11A includes a current affine model 1104 that includes samples from a top neighboring block (a neighboring block to the top of the current block) and samples from a left neighboring block (a neighboring block to the left of the current block). In some examples, the current related model includes an L-shaped block. The L-shaped block can include samples from a top neighbor block of the current block and samples from a left neighbor block of the current block (as shown in Figure 11A ). In other examples, the current affine model may include samples from a right neighboring block and / or a right neighboring block.
[0187] In some examples, process 1500 can store the stream of encoded video data. In
Petition 870190090180, of 9/11/2019, p. 100/167
94/121 In some cases, a processor of an encoder that performs the process 1500 or an apparatus (for example, a mobile device, or other suitable device) comprising the encoder can store the encoded video data stream in an encoder memory or in a memory of the device that comprises the encoder. In some examples, process 1500 may transmit the encoded video data stream.
[0188] In some examples, processes 1400 and 1500 can be performed by a computing device or device, such as encoding device 104, decoding device 112, or any other computing device. For example, process 1400 may be performed by decoding device 112, and process 1500 may be performed by encoding device 104. In some cases, the computing device or apparatus may include a processor, microprocessor, microcomputer, or other components of a device that is configured to perform process steps 1400 and 1500. In some instances, the computing device or apparatus may include a camera configured to capture video data (for example, a video sequence) including video frames. For example, the computing device may include a camera device, which may or may not include a video codec. As another example, the computing device may include a mobile device with a camera (for example, a camera device such as a digital camera, an IP camera or the like, a mobile phone or tablet computer that includes a camera, or
Petition 870190090180, of 9/11/2019, p. 101/167
95/121 other type of device with a camera). In some cases, the computing device may include a display for displaying images. In some instances, a camera or other capture device that captures video data is separate from the computing device, in which case the computing device receives the captured video data. The computing device may additionally include a network interface, transceiver, and / or transmitter configured to communicate video data. The network interface, transceiver, and / or transmitter can be configured to communicate Internet Protocol (IP) based data or other network data.
[0189] Processes 1400 and 1500 are illustrated as a logic flow diagram, the operation of which represents a sequence of operations that can be implemented in hardware, computer instructions or a combination thereof. In the context of computer instructions, operations represent computer executable instructions stored on one or more computer-readable storage media that, when executed by one or more processors, perform the aforementioned operations. In general, computer executable instructions include routines, programs, objects, components, data structures and the like that perform functions or deploy specific types of data. The order in which the operations are described is not intended to be construed as a limitation, and any number of the operations described can be combined in any order and / or in parallel to implement the processes.
[0190] Additionally, processes 1400 and
Petition 870190090180, of 9/11/2019, p. 102/167
96/121
1500 can be performed under the control of one or more computer systems configured with executable instructions and can be deployed as code (for example, executable instructions, one or more computer programs, or one or more applications) that collectively run on one or more processors, by hardware, or combinations thereof. As noted above, the code can be stored on a computer-readable or machine-readable storage medium, for example, in the form of a computer program that comprises a plurality of instructions executable by one or more processors. The computer-readable or machine-readable storage medium may be non-transitory.
[0191] The encryption techniques discussed in this document can be implemented in an exemplary video encoding and decoding system (for example, system 100). In some examples, a system includes a source device that provides encoded video data to be decoded at a later time by a destination device. In particular, the source device provides the video data to the destination device via a computer-readable medium. The source device and the target device can comprise any of a wide range of devices, including desktop computers, notebook computers (ie, laptop computers), tablet computers, signal decoders, telephone headsets such as so-called smart phones, so-called smart pads, televisions, cameras, display devices, players
Petition 870190090180, of 9/11/2019, p. 103/167
97/121 digital media, video game consoles, video streaming device or the like. In some cases, the source device and the destination device may be equipped for wireless communication.
[0192] The destination device can receive the encoded video data to be decoded by means of a computer-readable medium. The computer-readable medium can comprise any type of medium or device capable of moving the encoded video data from the source type to the destination device. In one example, the computer-readable medium may comprise a communication medium to enable the source device to transmit encoded video data directly to the destination device in real time. The encoded video data can be modulated according to a communication standard, such as a wireless communication protocol, and transmitted to the destination device. The communication medium can comprise any wireless or wired communication medium, such as a radio frequency (RF) spectrum or one or more physical transmission lines. The communication medium can form part of a packet-based network, such as a local area network, a wide area network, or a global network such as the Internet. The communication medium may include routers, switches, base stations or any other equipment that may be useful to facilitate communication from the source device to the destination device.
[0193] In some examples, encrypted data can be output from the output interface to a storage device. Similarly, the data
Petition 870190090180, of 9/11/2019, p. 104/167
98/121 encrypted can be accessed from the storage device via the input interface. The storage device can include any of a variety of distributed or locally accessed data storage media such as a hard drive, Bluray discs, DVDs, CD-ROMs, flash memory, volatile or non-volatile memory or any other storage media suitable for storing encoded video data. In an additional example, the storage device can correspond to a file server or another intermediate storage device that can store the encoded video generated by the source device. The target device can access stored video data from the storage device via streaming or downloading. The file server can be any type of server capable of storing encoded video data and transmitting that encoded video data to the target device. Exemplary file servers include a web server (for example, for a website), an FTP server, networked storage devices (NAS), or a local disk drive. The target device can access the encoded video data through any standard data connection, including an Internet connection. This can include a wireless channel (for example, a WiFi connection), a wired connection (for example, DSL, cable modem, etc.), or a combination of both that is suitable for accessing stored encoded video data on a file server. The transmission of encrypted video data from the storage device can
Petition 870190090180, of 9/11/2019, p. 105/167
99/121 be a stream transmission, a download transmission or a combination thereof.
[0194] The techniques of this disclosure are not necessarily limited to wireless applications or configurations. The techniques can be applied to video encryption in relation to any of a variety of multimedia applications, such as broadcasts on open television, cable television broadcasts, satellite television broadcasts, streaming video over the Internet, such as dynamic adaptive streaming (DASH), digital video that is encoded on a data storage medium, decoding of digital video stored on a data storage medium or other applications. In some instances, the system can be configured to support unidirectional or bidirectional video transmission to support applications such as video streaming, video playback, video broadcasting, and / or video telephony.
[0195] In one example, the source device includes a video source, a video encoder and an output interface. The target device may include an input interface, a video decoder and a display device. The video encoder of the source device can be configured to apply the techniques disclosed in this document. In other examples, a source device and a target device may include other components or arrangements. For example, the source device can receive video data from an external video source, such as an external camera. Likewise, the target device can make
Petition 870190090180, of 9/11/2019, p. 106/167
100/121 interface with an external display device, instead of including an integrated display device.
[0196] The example system above is merely an example. Techniques for processing video data in parallel can be performed using any digital video encoding and / or decoding device. Although, in general, the techniques of this disclosure are performed by a video encoding device, the techniques can also be performed by a video encoder / decoder, typically referred to as a CODEC. Furthermore, the techniques of this disclosure can also be performed using a video processor. The source device and the target device are merely examples of such encryption devices in which the source device generates encrypted video data for transmission to the target device. In some examples, the source and destination devices may operate in a substantially symmetrical manner so that each of the devices includes video encoding and decoding components. Therefore, exemplary systems can support one-way or two-way video transmission between video devices, for example, for streaming video, video playback, video broadcasting or video telephony.
[0197] The video source may include a video capture device, such as a video camera, a video file that contains previously captured video and / or a video feed interface for receiving video from a video content provider . As a
Petition 870190090180, of 9/11/2019, p. 107/167
101/121 additional alternative, the video source can generate data based on computer graphics like the source video, or a combination of live video, archived video and computer generated video. In some cases, if the video source is a video camera, the source device and the destination device can form so-called camera phones or video phones. As mentioned above, the techniques described in this disclosure may be applicable to video encryption in general, and can be applied to wireless and / or wired applications. In each case, the captured, pre-captured or computer generated video can be encoded by the video encoder. The encoded video information can then be output from the output interface to the computer-readable medium.
[0198] As noted, the computer-readable medium may include transient media, such as a wireless broadcast or wired network transmission, or storage media (that is, non-transient storage media), such as a hard drive, flash drive , compact disc, digital video disc, Blu-ray disc or other computer-readable media. In some examples, a network server (not shown) can receive encoded video data from the source device and provide the encoded video data to the destination device, for example, via network transmission. Similarly, a computing device in a media production facility, such as a disk stamping facility, can receive encoded video data from the source device and produce a disc that contains the video data
Petition 870190090180, of 9/11/2019, p. 108/167
102/121 coded. Therefore, the computer-readable medium can be understood to include one or more computer-readable media in various ways, in several examples.
[0199] The input device's Qnterface receives information from the computer-readable medium. Computer readable medium information may include syntax information defined by the video encoder, which is also used by the video decoder, which includes syntax elements that describe characteristics and / or processing of blocks and other encoded units, for example, group of figurations (GOP). A display device displays the decoded video data for a user, and can comprise any of a variety of display devices such as a cathode ray tube (CRT), a liquid crystal display (LCD), a plasma monitor , an organic light-emitting diode (OLED) monitor or another type of display device. Several types of application have been described.
[0200] Specific details of the encoding device 104 and the decoding device 112 are shown in Figure 16 and Figure 17, respectively. Figure 16 is a block diagram illustrating an exemplary coding device 104 that can implement one or more of the techniques described in this disclosure. The encoding device 104 can, for example, generate the syntax structures described in this document (for example, the syntax structures of a VPS, SPS, PPS, or other syntax elements). The encryption device 104 can perform intrapredictive and interpreted video block encryption within the
Petition 870190090180, of 9/11/2019, p. 109/167
103/121 video slices. As previously described, intracryption relies, at least in part, on spatial prediction to reduce or remove spatial redundancy within a given video frame or picture. Intercripting relies, at least in part, on time prediction to reduce or remove time redundancy within adjacent or surrounding frames of a video sequence. The intramode (mode I) can refer to any one of several compression modes with spatial basis. Intermodes, such as unidirectional prediction (P mode) or biprediction (B mode), can refer to any of the various time based compression modes.
[0201] The coding device 104 includes a partition unit 35, prediction processing unit 41, filter unit 63, figuration memory 64, adder 50, transform processing unit 52, quantization unit 54, and measurement unit entropy coding 56. Prediction processing unit 41 includes motion estimation unit 42, motion compensation unit 44, and intraprediction processing unit 46. For video block reconstruction, encoding device 104 also includes reverse quantization unit 58, reverse transform processing unit 60 and adder 62. Filter unit 63 is intended to represent one or more loop filters as an unlock filter, an adaptive loop filter (ALF), and a filter adaptive sample displacement (SAO). Although filter unit 63 is shown in Figure 16 as a loop filter, in other
Petition 870190090180, of 9/11/2019, p. 110/167
104/121 configurations, filter unit 63 can be deployed as a post-loop filter. A post-processing device 57 may perform additional processing on encoded video data generated by the encoding device 104. The techniques of this disclosure, in some cases, may be implemented by the encoding device 104. In other cases, however, one or more of the techniques of this disclosure can be deployed by
device in post-processing 57. [0202 ] As shown at Figure 16, O device in coding 104 receives Dice in video, and the unity of partition 35 partitions the Dice in blocks in
video. The partition can also include the sliced partition, slice segments, parts or other larger units, as well as the video block partition, for example, according to a quaternary tree structure of LCUs and CUs. The encoding device 104 generally illustrates the components that encode the video blocks in a video slice to be encoded. The slice can be divided into multiple video blocks (and possibly sets of video blocks referred to as pieces). The prediction processing unit 41 can select one of a plurality of possible encryption modes, such as one of a plurality of intraprediction encryption modes or one of a plurality of interpredition encryption modes, for the current video block based on the results error (for example, encryption rate and level of distortion or the like). The prediction processing unit 41 can deliver the resulting infra or intercrypted block to adder 50
Petition 870190090180, of 9/11/2019, p. 111/167
105/121 to generate residual block data and for adder 62 to reconstruct the coded block for use as a reference figure.
[0203] The intraprediction processing unit 46 in the prediction processing unit 41 can perform the intraprediction encryption of the current video block with respect to one or more neighboring blocks in the same frame or slice as the current block to be encrypted to provide compression space. The motion estimation unit 42 and the motion compensation unit 44 in the prediction processing unit 41 perform interpretive encryption of the current video block in relation to one or more predictive blocks in one or more reference figures to provide temporal compression .
[0204] The motion estimation unit 42 can be configured to determine the interpretation mode for a video slice according to a predetermined pattern for a video sequence. The predetermined pattern can designate the video slices in the sequence as P slices, B slices or GPB slices. The motion estimation unit 42 and the motion compensation unit 44 can be highly integrated, but are illustrated separately for conceptual purposes. The motion estimation performed by the motion estimation unit 42 is the process of generating motion vectors, which estimates the motion for video blocks. A motion vector, for example, can indicate the displacement of a prediction unit (PU) from a video block in a current video frame or picture in relation to
Petition 870190090180, of 9/11/2019, p. 112/167
106/121 a predictive block in a reference figure.
[0205] A predictive block is a block that is found to be very compatible with the PU of the video block to be encrypted, in terms of pixel difference, which can be determined by the sum of the absolute difference (SAD), sum of the difference square (SSD), or other difference metrics. In some examples, the coding device 104 can calculate values for pixel positions of the sub-number of reference figures stored in the figuration memory 64. For example, the coding device 104 can interpolate values of pixel positions of a quarter, positions of an eighth pixel or other fractional pixel positions of the reference figure. Therefore, the motion estimation unit 42 can perform a motion search in relation to the total pixel positions and fractional pixel positions and output a motion vector with fractional pixel precision.
[0206] The motion estimation unit 42 calculates a motion vector for a PU of a video block in an inter-encrypted slice when comparing the position of the PU with the position of a predictive block of a reference figure. The reference figure can be selected from a first reference figure list (List 0) or a second reference figure list (List 1), each of which identifies one or more reference figures stored in the figure memory 64. The motion estimation unit 42 sends the calculated motion vector to the entropy coding unit 56 and the motion compensation unit.
Petition 870190090180, of 9/11/2019, p. 113/167
107/121 movement 44.
[0207] Motion compensation, performed by motion compensation unit 44, may involve fetching or generating the predictive block based on the motion vector determined by the motion estimate, possibly performing interpolation for subpixel precision. Upon receipt of the motion vector for the PU of the current video block, the motion compensation unit 44 can locate the predictive block to which the motion vector points in one of the reference picture lists. The encoding device 104 forms a residual video block by subtracting the pixel values from the predictive block from the pixel values of the current video block that is encrypted, forming pixel difference values. The pixel difference values form residual data for the block, and can include both luma and chroma difference components. The adder 50 represents the component or components that perform this subtraction operation. The motion compensation unit 44 can also generate syntax elements associated with the video blocks and the video slice for use by the decoding device 112 in decoding the video blocks of the video slice.
[0208] The intraprediction processing unit 46 can intrapredict a current block, as an alternative to the interpretation performed by the motion estimation unit 42 and the motion compensation unit 44, as described above. In particular, the intraprediction processing unit 46 can determine an intraprediction mode to use to encode a current block. In some instances, the
Petition 870190090180, of 9/11/2019, p. 114/167
108/121 intraprediction processing 46 can encode a current block using several intraprediction modes, for example, during separate coding passes, and intraprediction processing unit 46 can select an intraprediction mode suitable for use a from the tested modes. For example, intraprediction processing unit 46 can calculate rate-distortion values using rate and distortion analysis for the various tested intraprediction modes, and can select the intraprediction mode that has the best rate characteristics and distortion among the tested modes. Rate distortion analysis generally determines an amount of distortion (or error) between an encoded block and an original uncoded block that was encoded to produce the encoded block, as well as a bit rate (that is, a number bit) used to produce the coded block. The intraprediction processing unit 46 can calculate ratios from distortions and rates for the various encoded blocks to determine which intraprediction mode displays the best rate value and for the block.
[0209] In any case, after selecting an intraprediction mode for a block, the intraprediction processing unit 46 can provide information indicative of the intraprediction mode selected for the block for the entropy coding unit 56. The coding unit by entropy 56 can encode information that indicates the selected intraprediction mode. The encoding device 104 may include in the transmitted bitstream configuration data definitions of
Petition 870190090180, of 9/11/2019, p. 115/167
109/121 coding contexts for multiple blocks as well as indications of a more likely intraprediction mode index, an intrapredition mode index table, and a modified intrapredition mode index table for use for each of the contexts. The bitstream configuration data may include a plurality of intraprediction mode index tables and a plurality of modified intrapredition mode index tables (also referred to as codeword mapping tables).
[0210] After the prediction processing unit 41 generates the predictive block for the current video block by means of interpretation or intraprediction, the encoding device 104 forms a residual video block by subtracting the predictive block from the current video block. Residual video data in the residual block can be included in one or more TUs and applied to transform processing unit 52. Transform processing unit 52 transforms residual video data into residual transform coefficients using a transform , such as a discrete cosine transform (DOT) or a conceptually similar transform. The transform processing unit 52 can convert the residual video data from a pixel domain to a transform domain, such as a frequency domain.
[0211] The transform processing unit 52 can send the resulting transform coefficients to the quantization unit 54. The quantization unit 54 quantizes the transform coefficients to further reduce the bit rate. The process of
Petition 870190090180, of 9/11/2019, p. 116/167
110/121 quantization can reduce the bit depth associated with some or all of the coefficients. The degree of quantization can be modified by adjusting a quantization parameter. In some examples, the quantization unit 54 can then scan the matrix that includes the quantized transform coefficients. Alternatively, the entropy coding unit 56 can perform the scan.
[0212] Following quantization, entropy coding unit 56 entropy codes the quantized transform coefficients. For example, entropy coding unit 56 can perform context-adaptive variable-length encryption (CAVLC), context-adaptive binary arithmetic (CABAC), syntax-based context-adaptive binary encryption (SBAC), entropy encryption probability interval partition (PIPE) or another entropy coding technique. Following entropy coding by the entropy coding unit 56, the encoded bit stream can be transmitted to the decoding device 112, or archived for later transmission or retrieval by the decoding device 112. The entropy coding unit 56 also can entropy motion vectors and other syntax elements for the current video slice that is encoded by entropy.
[0213] The reverse quantization unit 58 and the reverse transform processing unit 60 apply reverse quantization and reverse transformation, respectively, to reconstruct the residual block in the
Petition 870190090180, of 9/11/2019, p. 117/167
111/121 pixel domain for later use as a reference block for a reference figuration. The motion compensation unit 44 can calculate a reference block by adding the residual block to a predictive block of one of the reference figures in a reference figure list. The motion compensation unit 44 can also apply one or more interpellation filters to the reconstructed residual block to calculate
the values in pixel subplots for use in the estimate in movement. 0 adder 62 add O residual block reconstructed to the block prediction with compensation in movement produced by unity in compensation in movement 44 to product a block in reference for storage in figurative memory 64. 0 block in
The reference unit can be used by the motion estimation unit 42 and the motion compensation unit 44 as a reference block to interpret a block in a subsequent video frame or picture.
[0214] The coding device 104 can perform any of the techniques described in this document. Some techniques of this disclosure have generally been described in relation to the coding device 104, but, as mentioned above, some of the techniques of this disclosure can also be implemented by the post-processing device 57.
[0215] The encoding device 104 of Figure 16 represents an example of a video decoder configured to perform the derivation of related motion based on model compatibility described in the present document. The coding device 104 can,
Petition 870190090180, of 9/11/2019, p. 118/167
112/121 for example, determining related motion parameters, using related motion parameters to determine related motion for one or more blocks of one or more pictures, and generating a stream of video data encoded with a syntax item (for example , syntax element, syntax structure, variable, marker, or the like) indicating that the related motion derivation mode based on model compatibility should be used for the one or more blocks. The coding device 104 can perform any of the techniques described in this document, including the process described above in relation to Figure 15.
[0216]
Figure 17 is a block diagram illustrating a decoding device
112 example. The decoding device 112 includes an entropy decoding unit 80, prediction processing unit 81, inverse quantization unit 86, transform processing unit 88, adder 91 and figuration memory 92. Prediction processing unit 81 includes the motion compensation unit 82 and the intraprediction processing unit 84. The decoding device 112 may, in some instances, perform a decoding pass generally reciprocal to the encoding pass described in relation to the encoding device 104 of Figure 16.
[0217]
During the decoding process, the decoding device 112 receives an encoded video bit stream that represents video blocks from an encoded video slice and associated syntax elements sent by the encoding device 104. In some embodiments, the decoding device 112 may
Petition 870190090180, of 9/11/2019, p. 119/167
113/121 receiving the encoded video bit stream from the encoding device 104. In some embodiments, the decoding device 112 may receive the encoded video data stream from a network entity 79, such as a server, an element of media aware network (MANE), a video editor / separator, or other such device configured to implement one or more of the techniques described above. Network entity 79 may or may not include encryption device 104. Some of the techniques described in this disclosure may be implemented by network entity 7 9 before network entity 7 9 that transmits the encoded video bit stream to the encryption device. decoding 112. In some video decoding systems, the network entity 79 and the decoding device 112 may be part of the separate devices, while on other occasions, the functionality described in relation to the network entity 79 can be performed by the same device comprising the decoding device 112.
[0218] The entropy decoding unit
80 of the device decoding 112 decodes per entropy o flow in bits for generate coefficients quantized, vectors in movement and others elements in syntax. THE unity in decoding by entropy 80 forwards the vectors in movement and others elements in syntax for the prediction processing unit 81 . 0
decoding device 112 can receive the elements of syntax at the video slice level and / or at the video block level. The entropy decoding unit 80 can process and analyze both the syntax elements of
Petition 870190090180, of 9/11/2019, p. 120/167
114/121
fixed length how much the elements of syntax of variable length in or more sets of parameters, like a VPS, SPS and PPS. [0219] When the slice of video for
encrypted as an intra-encrypted slice (I), the intraprediction processing unit 84 of the prediction processing unit 81 can generate prediction data for a video block of the current video slice based on a signaled intraprediction mode and block data previously decoded from the current picture or figuration. When the video frame is encrypted as an encrypted slice (i.e., B, P or GPB), the motion compensation unit 82 of the prediction processing unit 81 produces predictive blocks for a video block of the current video slice with based on motion vectors and other syntax elements received from entropy decoding unit 80. Predictive blocks can be produced from one of the reference figures in a reference figure list. The decoding device 112 can build the reference frame lists, List 0 and List 1, using standard construction techniques based on the reference figures stored in the figuration memory 92.
[0220] Motion compensation unit 82 determines the prediction information for a video block from the current video slice by analyzing motion vectors and other syntax elements, and uses the prediction information to produce the predictive blocks for the current block of video that is decoded. For example, the motion compensation unit 82 can use
Petition 870190090180, of 9/11/2019, p. 121/167
115/121 one or more elements of syntax in a set of parameters to determine a prediction mode (for example, intra or interpretation) used to encode the video blocks of the video slice, a type of interpretation slice (for example, slice B, slice P or GPB slice), construction information for one or more reference slice lists for the slice, motion vectors for each slice's intercodified video block, interpretation status for each slice's inter-encrypted video block, and other information to decode the video blocks in the current video slice.
[0221] The motion compensation unit 82 can also perform interpolation based on the interpolation filters. The motion compensation unit 82 can use the interpolation filters used by the encoding device 104 during the encoding of the video blocks to calculate the interpolation values for subintelligent pixels of the reference blocks. In that case, the motion compensation unit 82 can determine the interpolation filters used by the encoding device 104 of the received syntax elements, and can use the interpolation filters to produce predictive blocks.
[0222] The inverse quantization unit 86 inversely quantizes or decantifies, the quantized transform coefficients provided in the bit stream and decoded by the entropy decoding unit 80. The inverse quantization process may include the use of a quantization parameter calculated by encoding device 104 for each video block in the video slice
Petition 870190090180, of 9/11/2019, p. 122/167
116/121 to determine a degree of quantization and, also, a degree of inverse quantization that must be applied. The reverse transform processing unit 88 applies an inverse transform (for example, a reverse DCT or other suitable reverse transform), a reverse integer transform, or a conceptually similar reverse transform process, to the transform coefficients in order to produce the residual blocks in the pixel domain.
[0223] After the motion compensation unit 82 generates the predictive block for the current video block based on the motion vectors and other syntax elements, the decoding device 112 forms a decoded video block by adding the residual blocks of the reverse transform processing unit 88 with the corresponding predictive blocks generated by the motion compensation unit 82. The adder 90 represents the component or components that perform this sum operation. If desired, loop filters (in the encryption loop or after the encryption loop) can also be used to smooth out pixel transitions, or otherwise improve the quality of the video. Filter unit 91 is intended to represent one or more loop filters as an unlock filter, an adaptive loop filter (ALF), and an adaptive sample displacement filter (SAO). Although filter unit 1 is shown in Figure 17 as a loop filter, in other configurations, filter unit 91 can be deployed as a post-loop filter. The video blocks decoded in a given frame or figure are,
Petition 870190090180, of 9/11/2019, p. 123/167
117/121 then, stored in the figuration memory 92, which stores reference figures used for the subsequent motion compensation. Figure memory 92 also stores the decoded video for later presentation on a display device, such as the video target device 122 shown in Figure 1.
[0224] The decoding device 112 of Figure 17 represents an example of a video decoder configured to perform the derivation of related motion based on model compatibility described in this document. The decoding device 112 can, for example, determine related motion parameters and use the related motion parameters to determine the related motion for one or more blocks of one or more pictures. The decoding device 112 can perform any of the techniques described in this document, including the process described above in relation to Figure 14.
[0225] At previous description, the aspects of request are described with reference at modalities specific of same, but those who are knowledgeable at
will recognize that the subject of this application is not limited to this. Thus, although the illustrative modalities of the application have been described in detail in this document, it should be understood that inventive concepts can be otherwise incorporated and employed in a variety of ways, and that the attached claims are intended to be interpreted for include such variations, except as limited by the prior art. Various features and aspects of the subject described above can
Petition 870190090180, of 9/11/2019, p. 124/167
118/121 be used individually or together. In addition, the modalities can be used in any number of environments and applications in addition to those described in this document without departing from the broader spirit and scope of the specification. The specification and drawings should therefore be considered as illustrative rather than restrictive. For purposes of illustration, the methods have been described in a specific order. It should be noted that in alternative modalities, the methods can be carried out in a different order than described.
[0226] When the components are described as being configured to perform certain operations, such configuration can be achieved, for example, by designing electronic circuits or other hardware to perform the operation, by programming the programmable electronic circuits (for example, microprocessors , or other suitable electronic circuits) to perform the operation, or any combination thereof.
[0227] The various logic blocks, models, circuits and illustrative algorithm steps described together with the modalities disclosed in this document can be implemented as electronic hardware, computer software, firmware or combinations of both. To clearly illustrate this interchangeability of hardware and software, several components, blocks, models, circuits and illustrative steps have been described above in general terms of their functionality. Whether such functionality is implemented as hardware or software, depends on the design and particular order restrictions imposed on the general system. The versed
Petition 870190090180, of 9/11/2019, p. 125/167
119/121 may deploy the functionality described in variant modes for each specific application, but such deployment decisions should not be interpreted as causing a separation of the scope of this application.
[0228] The techniques described in this document can also be implemented in electronic hardware, computer software, firmware or any combination thereof. Such techniques can be deployed on any of a variety of devices such as general purpose computers, wireless communication device headsets, or integrated circuit devices that have multiple uses including application on wireless communication device headsets and other devices. Any features described as modules or components can be deployed in an integrated logic device or separately as discrete but interoperable logic devices. If implemented in software, the techniques can be performed, at least in part, by a computer-readable data storage medium that comprises program code that includes instructions that, when executed, perform one or more of the methods described above. The computer-readable data storage medium may form part of a computer program product, which may include packaging materials. The computer-readable medium may comprise memory or data storage media, such as random access memory (RAM) such as synchronous dynamic random access memory (SDRAM), read-only memory (ROM), non-volatile random access memory (NVRAM) ), electrically programmable and erasable read-only memory (EEPROM), memory
Petition 870190090180, of 9/11/2019, p. 126/167
120/121
FLASH, magnetic or optical data storage media, and the like. The techniques can, additionally or alternatively, be carried out, at least in part, by a computer-readable communication medium that carries or communicates program code in the form of instructions or data structures and that can be accessed, read and / or executed by a computer, such as signals or propagated waves.
[0229] The program code may be executed by a processor, which may include one or more processors, such as one or more digital signal processors (DSPs), general purpose microprocessors, application specific integrated circuits (ASICs), logic matrices field programmable (FPGAs) or other discrete or integrated logic circuitry. Such a processor can be configured to perform any of the techniques described in this disclosure. A general purpose processor can be a microprocessor, but alternatively, the processor can be any processor, controller, microcontroller or conventional state machine. A processor can also be deployed as a combination of computing devices, for example, a combination of a DSP and a microprocessor, a plurality of microprocessors, one or more microprocessors in conjunction with a DSP core, or any other such configuration. Thus, the term processor, as used in this document, can refer to any one of the previous structure, any combination of the previous structure, or any other structure or apparatus suitable for the implantation of the
Petition 870190090180, of 9/11/2019, p. 127/167
121/121 techniques described in this document. In addition, in some respects, the functionality described in this document can be provided in software modules or dedicated hardware modules configured to encode and decode, or incorporated into a combined video encoder-decoder (CODEC).
权利要求:
Claims (56)
[1]
1. Method of deriving one or more sets of related motion parameters in a decoder comprising:
obtaining, through the decoder, video data from an encoded video data stream, the video data including at least one current picture and a reference picture; and determine, through the decoder, a set of related motion parameters for a current block of the current picture, and the related set of motion parameters is used to perform the motion compensation prediction for the current block, in which the set of related motion parameters is determined using a current related model of the current block and a related model of the reference figure.
[2]
A method according to claim 1, which further comprises:
determine motion vectors for a plurality of sub-blocks of the current block using the set of related motion parameters for the current block.
[3]
Method according to claim 1, which further comprises:
determine motion vectors for a plurality of pixels of the current block using the set of related motion parameters for the current block.
[4]
4. Method according to claim 1, wherein determining the set of motion parameters
Petition 870190090180, of 9/11/2019, p. 129/167
2/15 related to the current block includes:
obtain, through the decoder, a set of initial affine motion parameters, derive, through the decoder, one or more affine motion vectors for one or more pixels in the current affine model of the current block using the affine movement parameter set initial, and the current affine model of the current block includes reconstructed pixels neighboring the current block;
determine, through the decoder, one or more pixels in the reference model of the reference figure using the one or more related motion vectors derived for the one or more pixels in the current model;
minimize, through the decoder, an error between at least the one or more pixels in the current related model and the one or more pixels in the reference related model determined using one or more related motion vectors; and determine, through the decoder, the set of motion parameters related to one or more control points of the current related model based on the minimized error between at least one or more pixels in the current related model and the one or more pixels in the model related reference.
[5]
5. Method according to claim 4, in which the determination of the set of related motion parameters for the one or more control points of the current related model includes: determining a plurality of sets of related motion parameters for the one or more more control points of the current related model with the use of at least the one
Petition 870190090180, of 9/11/2019, p. 130/167
3/15 or more pixels in the current related model and the one or more pixels in the reference related model determined using one or more related motion vectors;
determine a quality metric for each set of motion parameters in order of the plurality of sets of related motion parameters; and selecting, for the one or more control points of the current related model, the set of motion parameters related to the plurality of sets of related motion parameters that have a smaller metric among the plurality of sets of related motion parameters.
[6]
6. Method according to claim 5, in which the quality metric includes a sum of absolute differences (SAD).
[7]
7. Method according to claim 4, in which the set of initial affine motion parameters is determined based on a translational motion vector determined for the current block.
[8]
8. Method according to claim 7, wherein the translational motion vector is determined using the frame rate upward conversion (FRUC) model compatibility.
[9]
9. Method according to claim 4, in which the set of initial affine motion parameters is determined based on an affine motion vector of a neighboring block of the current block.
[10]
A method according to claim 1, wherein no related motion parameter is decoded from the encoded video data stream to determine the set of related motion parameters.
Petition 870190090180, of 9/11/2019, p. 131/167
4/15
[11]
11. Method, according to claim 1, in which the current related model of the current block includes one or more samples spatially neighboring the current block.
[12]
12. The method of claim 11, wherein the spatially neighboring samples include samples from one or more of a top neighboring block or a left neighboring block.
[13]
13. Method, according to claim 1, in which the current related model includes an L-shaped block, the L-shaped block including samples from a top neighbor block of the current block and samples from a neighboring block left of the current block.
[14]
14. Decoder to derive one or more sets of related motion parameters comprising:
a memory configured to store video data from an encoded video data stream; and a processor configured to:
obtaining the video data from the encoded video data stream, the video data obtained including at least one current picture and a reference picture; and determine a set of related motion parameters for a current block of the current picture, with the set of related motion parameters being used to perform the motion compensation prediction for the current block, where the set of related motion parameters it is determined using a current model similar to the current block and a model related to the reference figure.
[15]
15. Decoder, according to claim
Petition 870190090180, of 9/11/2019, p. 132/167
5/15
14, in which the processor is additionally configured to: determine motion vectors for a plurality of sub-blocks of the current block using the set of related motion parameters determined for the current block.
[16]
16. Decoder, according to claim
14, in which the processor is additionally configured to: determine motion vectors for a plurality of pixels in the current block using the set of related motion parameters determined for the current block.
[17]
17. Decoder, according to claim
14, in which the determination of the set of related movement parameters for the current block includes:
to obtain a set of parameters of initial related movement;
derive one or more affine motion vectors for one or more pixels in a current affine model of the current block using the set of initial affine movement parameters, the current affine model of the current block includes reconstructed pixels neighboring the current block;
determining one or more pixels in the reference model of the reference figure using the one or more related motion vectors derived for the one or more pixels in the current model;
minimize an error between at least one or more pixels in the current related model and the one or more pixels in the reference related model determined using one or more related motion vectors; and determine the set of related motion parameters for one or more control points of the related model
Petition 870190090180, of 9/11/2019, p. 133/167
6/15 current based on the minimized error between at least one or more pixels in the current related model and one or more pixels in the reference related model.
[18]
18. Decoder, according to claim
17, in which the determination of the set of related motion parameters for the one or more control points of the current related model includes: determining a plurality of sets of related motion parameters for the one or more control points of the current related model with the use of at least one or more pixels in the current related model and one or more pixels in the related reference model determined using one or more related motion vectors;
determine a quality metric for each set of motion parameters in order of the plurality of sets of related motion parameters; and selecting, for the one or more control points of the current related model, the set of motion parameters related to the plurality of sets of related motion parameters that have a smaller metric among the plurality of sets of related motion parameters.
[19]
19. Decoder, according to claim
18, in which the quality metric includes a sum of absolute differences (SAD).
[20]
20. Decoder, according to claim
17, in which the set of initial related motion parameters is determined based on a translational motion vector determined for the current block.
[21]
21. Decoder, according to claim
20, in which the translational motion vector is determined using the conversion model compatibility
Petition 870190090180, of 9/11/2019, p. 134/167
7/15 rising frame rate (FRUC).
[22]
22. Decoder, according to claim
17, in which the set of initial related motion parameters is determined based on a related motion vector from a neighboring block of the current block.
[23]
23. Decoder, according to claim
14, in which no related motion parameter is decoded from the encoded video data stream to determine the set of related motion parameters.
[24]
24. Decoder, according to claim
14, in which the current related model of the current block includes one or more samples spatially neighboring the current block.
[25]
25. Decoder, according to claim
24, in which spatially neighboring samples include samples from one or more of a top neighboring block or a left neighboring block.
[26]
26. Decoder, according to claim
14, in which the current related model includes an L-shaped block, with the L-shaped block including samples from a top neighboring block of the current block and samples from a left neighboring block of the current block.
[27]
27. Decoder, according to claim
14, wherein the decoder is part of a mobile device with a display for displaying decoded video data.
[28]
28. Decoder, according to claim
14, in which the decoder is part of a mobile device with a camera for capturing pictures.
[29]
29. Method of encoding video data comprising:
Petition 870190090180, of 9/11/2019, p. 135/167
8/15 obtain video data, the video data including at least one current figure and a reference figure;
determine a set of related motion parameters for a current block of the current picture, and the set of related motion parameters is used to perform the motion compensation prediction for the current block, where the set of related motion parameters is determined using a current model similar to the current block and a model related to the reference figure; and generate a stream of encoded video data, where the stream of encoded video data includes a syntax item that indicates that the relative motion derivation mode based on model compatibility must be used by a decoder for the current block, where the encoded video data stream does not include any related motion parameters to determine the set of
related motion parameters. 30. Method, according with the claim 29, what additionally comprises: determine vectors in movement for an plurality of sub-blocks of the block current with O use of
set of related motion parameters determined for the current block.
[30]
31. The method of claim 29, which further comprises:
determine motion vectors for a plurality of pixels of the current block using the set of related motion parameters for the block
Petition 870190090180, of 9/11/2019, p. 136/167
Current 9/15.
[31]
32. The method of claim 29, wherein determining the set of related motion parameters for the current block includes:
to obtain a set of parameters of initial related movement;
derive one or more affine motion vectors for one or more pixels in a current affine model of the current block using the set of initial affine movement parameters, the current affine model of the current block includes reconstructed pixels neighboring the current block;
determining one or more pixels in the reference model of the reference figure using the one or more related motion vectors derived for the one or more pixels in the current model;
minimize an error between at least one or more pixels in the current related model and the one or more pixels in the reference related model determined using one or more related motion vectors; and determining the set of related motion parameters for one or more control points of the current related model based on the minimized error between at least one or more pixels in the current related model and the one or more pixels in the reference related model.
[32]
33. The method of claim 32, wherein determining the set of affine motion parameters for the one or more control points of the current affine model includes: determining a plurality of sets of affine motion parameters for the one or more more control points of the current related model with the use of at least the one
Petition 870190090180, of 9/11/2019, p. 137/167
10/15 or more pixels in the current related model and the one or more pixels in the reference related model determined using one or more related motion vectors;
determine a quality metric for each set of motion parameters in order of the plurality of sets of related motion parameters; and selecting, for the one or more control points of the current related model, the set of motion parameters related to the plurality of sets of related motion parameters that have a smaller metric among the plurality of sets of related motion parameters.
[33]
34. The method of claim 33, wherein the quality metric includes a sum of absolute differences (SAD).
[34]
35. The method of claim 32, wherein the set of initial affine motion parameters is determined based on a translational motion vector determined for the current block.
[35]
36. The method of claim 35, wherein the translational motion vector is determined using the frame rate upward conversion model (FRUC) compatibility.
[36]
37. The method of claim 32, wherein the set of initial affine motion parameters is determined based on an affine motion vector from a neighboring block to the current block.
[37]
38. Method according to claim 29, wherein the current affine model of the current block includes one or more samples spatially neighboring the current block.
[38]
39. Method according to claim 38, in
Petition 870190090180, of 9/11/2019, p. 138/167
11/15 that spatially neighboring samples include samples from one or more of a top neighboring block or a left neighboring block.
[39]
40. Method according to claim 29, wherein the current related model includes an L-shaped block, the L-shaped block including samples from a top neighbor block of the current block and samples from a neighboring block left of the current block.
[40]
41. The method of claim 29, further comprising storing the encoded video data stream.
[41]
42. The method of claim 29, further comprising transmitting the encoded video data stream.
[42]
43. Encoder to encode video data that comprises:
a memory configured to store video data; and a processor configured to:
obtain the video data, the video data including at least one current figure and a reference figure;
determine a set of related motion parameters for a current block of the current picture, and the set of related motion parameters is used to perform the motion compensation prediction for the current block, where the set of related motion parameters is determined using a current model similar to the current block and a model related to the reference figure; and
Petition 870190090180, of 9/11/2019, p. 139/167
12/15 generate an encoded video data stream, where the encoded video data stream includes a syntax item that indicates that the related motion derivation mode based on model compatibility must be used by a decoder for the block current, where the encoded video data stream does not include any related motion parameters to determine the set of related motion parameters.
[43]
44. Encoder, according to the claim
43, in which the processor is additionally configured to: determine motion vectors for a plurality of sub-blocks of the current block using the set of related motion parameters determined for the current block.
[44]
45. Encoder, according to the claim
43, in which the processor is additionally configured to: determine motion vectors for a plurality of pixels of the current block using the set of related motion parameters determined for the current block.
[45]
46. Encoder according to claim 43, in which the determination of the set of related motion parameters for the current block includes:
to obtain a set of parameters of initial related movement;
derive one or more affine motion vectors for one or more pixels in a current affine model of the current block using the set of initial affine motion parameters, the current affine model of the current block includes reconstructed pixels neighboring the current block;
determine one or more pixels in the model in order to
Petition 870190090180, of 9/11/2019, p. 140/167
13/15 reference of the reference figuration using one or more related motion vectors derived for the one or more pixels in the current related model;
minimize an error between at least one or more pixels in the current related model and the one or more pixels in the reference related model determined using one or more related motion vectors; and determining the set of related motion parameters for one or more control points of the current related model based on the minimized error between at least one or more pixels in the current related model and the one or more pixels in the reference related model.
[46]
47. Encoder, according to the claim
46, in which the determination of the set of related motion parameters for the one or more control points of the current related model includes: determining a plurality of sets of related motion parameters for the one or more control points of the current related model with the use of at least one or more pixels in the current affine model and the one or more pixels in the reference affine model determined with the use of one or more affine motion vectors;
determine a quality metric for each set of related motion parameters from the plurality of sets of related motion parameters, and select, for the one or more control points of the current related model, the set of plurality related motion parameters of sets of related motion parameters that have a smaller metric among the plurality of sets of related motion parameters.
[47]
48. Encoder, according to claim
Petition 870190090180, of 9/11/2019, p. 141/167
14/15
47, in which the quality metric includes a sum of absolute differences (SAD).
[48]
49. Encoder, according to claim
46, in which the set of initial related motion parameters is determined based on a translational motion vector determined for the current block.
Encoder, according to claim
49, in which the translational motion vector is determined using the frame rate upward conversion (FRUC) model compatibility.
4 6, in order to
[49]
51. Encoder according to the claim that the set of is determined with a block neighboring the
[50]
52. Encoder motion parameters based on a current block motion vector.
according to claim
43, in which the current related model of the current block includes one or more samples spatially neighboring the current block.
[51]
53. Encoder, according to claim
52, in which spatially neighboring samples include samples from one or more of a top neighboring block or a left neighboring block.
[52]
54. Encoder, according to the claim
43, in which the current related model includes an L-shaped block, with the L-shaped block including samples from a top neighbor block of the current block and samples from a left neighbor block of the current block.
[53]
55. Encoder, according to claim
43, wherein the processor is configured to store the stream of encoded video data in memory.
[54]
56. Encoder, according to claim
Petition 870190090180, of 9/11/2019, p. 142/167
15/15
43, which further comprises a transmitter configured to transmit the encoded video data stream.
[55]
57. Encoder, according to claim
43, wherein the encoder is part of a mobile device with a display for displaying decoded video data.
[56]
58. Encoder, according to claim
43, in which the encoder is part of a mobile device with a camera for capturing pictures.
类似技术:
公开号 | 公开日 | 专利标题
BR112019018866A2|2020-04-14|derivation of related movement information
RU2742298C2|2021-02-04|Motion vector derivation in video coding
JP2019534622A|2019-11-28|Improvements to the frame rate upconversion coding mode
KR20190055819A|2019-05-23|Systems and methods for adaptively determining a template size for illumination compensation
JP2022504073A|2022-01-13|Improved history-based motion vector predictors
ES2750176T3|2020-03-25|Method and apparatus for effective treatment of fragment headers
KR20160024960A|2016-03-07|Depth oriented inter-view motion vector prediction
EP3308544A1|2018-04-18|Systems and methods of determining illumination compensation status for video coding
TW201249215A|2012-12-01|Motion vector prediction in video coding
ES2780686T3|2020-08-26|Type of dependency between views in MV-HEVC
TW201440492A|2014-10-16|Advanced residual prediction in scalable and multi-view video coding
TW201440504A|2014-10-16|Disparity vector derivation
KR20160096649A|2016-08-16|Controlling sub prediction unit | motion parameter inheritance | in three dimensional | hevc or other 3d coding
ES2761865T3|2020-05-21|Bi-prediction restriction procedures and systems in video encoding
BR112020006232A2|2020-10-13|Affine prediction motion information encoding for video encoding
US10681378B2|2020-06-09|Method for encoding and decoding video including plurality of layers
KR20150065841A|2015-06-15|Motion field upsampling for scalable coding based on high efficiency video coding
CN112956190A|2021-06-11|Affine motion prediction
BR112021004492A2|2021-05-25|adaptive multiple transform coding
CN113196775A|2021-07-30|Virtual search area for Current Picture Reference | and Intra Block Copy |
BR122021010912A2|2021-07-20|IMAGE DECODING METHOD PERFORMED BY A DECODING APPARATUS, IMAGE ENCODING METHOD PERFORMED BY AN ENCODING APPARATUS, COMPUTER-READABLE NON TRANSIENT STORAGE MEDIA, DECODING APPARATUS FOR IMAGE DECODING AND IMAGE ENCODING APPLIANCE FOR ENCODING
BR112020011099A2|2020-11-17|intra-prediction with distant neighboring pixels
TW202042552A|2020-11-16|Block size restriction for illumination compensation
BR112021004505A2|2021-06-08|motion vector prediction method based on affine and device motion model
TW202031052A|2020-08-16|Pruning for illumination compensation mode
同族专利:
公开号 | 公开日
SG11201907090WA|2019-09-27|
WO2018169923A1|2018-09-20|
US20180270500A1|2018-09-20|
CN110383839A|2019-10-25|
EP3596925B1|2021-03-03|
KR20190120389A|2019-10-23|
EP3596925A1|2020-01-22|
AU2018234607A1|2019-08-22|
US10701390B2|2020-06-30|
引用文献:
公开号 | 申请日 | 公开日 | 申请人 | 专利标题

AU3027301A|2000-01-21|2001-07-31|Nokia Mobile Phones Ltd|A motion estimation method and a system for a video coder|
US7558320B2|2003-06-13|2009-07-07|Microsoft Corporation|Quality control in frame interpolation with motion analysis|
WO2006018796A2|2004-08-13|2006-02-23|Koninklijke Philips Electronics, N.V.|System and method for reducing complexity in a color sequential display system|
JP4489033B2|2005-03-25|2010-06-23|三洋電機株式会社|Frame rate conversion device, pan / tilt determination device and video device|
EP1931141A4|2005-09-30|2010-11-03|Sharp Kk|Image display device and method|
US20090122188A1|2005-11-07|2009-05-14|Toshiharu Hanaoka|Image display device and method|
JP4181592B2|2006-09-20|2008-11-19|シャープ株式会社|Image display apparatus and method, image processing apparatus and method|
JP4615508B2|2006-12-27|2011-01-19|シャープ株式会社|Image display apparatus and method, image processing apparatus and method|
KR101366242B1|2007-03-29|2014-02-20|삼성전자주식회사|Method for encoding and decoding motion model parameter, and method and apparatus for video encoding and decoding using motion model parameter|
WO2010086544A1|2009-01-28|2010-08-05|France Telecom|Method and device for encoding an image using a prediction mask decoding method and device and corresponding signals and computer programs|
US8718142B2|2009-03-04|2014-05-06|Entropic Communications, Inc.|System and method for frame rate conversion that utilizes motion estimation and motion compensated temporal interpolation employing embedded video compression|
US8363721B2|2009-03-26|2013-01-29|Cisco Technology, Inc.|Reference picture prediction for video coding|
US20100246675A1|2009-03-30|2010-09-30|Sony Corporation|Method and apparatus for intra-prediction in a video encoder|
US20100289944A1|2009-05-12|2010-11-18|Shing-Chia Chen|Frame Rate Up-Conversion Based Dynamic Backlight Control System and Method|
US8675736B2|2009-05-14|2014-03-18|Qualcomm Incorporated|Motion vector processing|
US8345070B2|2009-06-10|2013-01-01|Himax Media Solutions, Inc.|Apparatus and method for frame rate up conversion|
CN102883160B|2009-06-26|2016-06-29|华为技术有限公司|Video image motion information getting method, device and equipment, template construction method|
JP5566133B2|2010-03-05|2014-08-06|キヤノン株式会社|Frame rate conversion processor|
CN105915918B|2010-04-13|2019-09-06|Ge视频压缩有限责任公司|Method and apparatus across planar prediction|
WO2011150109A1|2010-05-26|2011-12-01|Qualcomm Incorporated|Camera parameter- assisted video frame rate up conversion|
US8666120B2|2010-12-14|2014-03-04|The United States Of America, As Represented By The Secretary Of The Navy|Method and apparatus for conservative motion estimation from multi-image sequences with optimized motion compensation|
WO2013171175A1|2012-05-14|2013-11-21|Luca Rossato|Encoding and reconstruction of residual data based on support information|
JP2014138242A|2013-01-16|2014-07-28|Sony Corp|Image processing device and image processing method|
US9860529B2|2013-07-16|2018-01-02|Qualcomm Incorporated|Processing illumination compensation for video coding|
EP3111644A1|2014-02-25|2017-01-04|Apple Inc.|Adaptive transfer function for video encoding and decoding|
US9438910B1|2014-03-11|2016-09-06|Google Inc.|Affine motion prediction in video coding|
MX366439B|2014-03-14|2019-07-09|Samsung Electronics Co Ltd|Method and device for configuring merge candidate list for decoding and encoding of interlayer video.|
US10158884B2|2014-03-19|2018-12-18|Qualcomm Incorporated|Simplified merge list construction process for 3D-HEVC|
CN112087630A|2014-09-30|2020-12-15|华为技术有限公司|Image prediction method and related device|
KR102059256B1|2015-06-05|2019-12-24|애플 인크.|Render and display HDR content|
CN108600749B|2015-08-29|2021-12-28|华为技术有限公司|Image prediction method and device|
US10560712B2|2016-05-16|2020-02-11|Qualcomm Incorporated|Affine motion prediction for video coding|
US10778999B2|2016-09-30|2020-09-15|Qualcomm Incorporated|Frame rate up-conversion coding mode with affine motion model|CN106331722B|2015-07-03|2019-04-26|华为技术有限公司|Image prediction method and relevant device|
US10778999B2|2016-09-30|2020-09-15|Qualcomm Incorporated|Frame rate up-conversion coding mode with affine motion model|
WO2019004283A1|2017-06-28|2019-01-03|シャープ株式会社|Video encoding device and video decoding device|
CN109391814A|2017-08-11|2019-02-26|华为技术有限公司|Encoding video pictures and decoded method, device and equipment|
US10785494B2|2017-10-11|2020-09-22|Qualcomm Incorporated|Low-complexity design for FRUC|
CN112204980A|2018-04-24|2021-01-08|Lg电子株式会社|Method and apparatus for inter prediction in video coding system|
US20190364295A1|2018-05-25|2019-11-28|Tencent America LLC|Method and apparatus for video coding|
WO2019234600A1|2018-06-05|2019-12-12|Beijing Bytedance Network Technology Co., Ltd.|Interaction between pairwise average merging candidates and intra-block copy |
US11012703B2|2018-06-13|2021-05-18|Panasonic Intellectual Property Corporation Of America|Encoder, decoder, encoding method, and decoding method|
TWI739120B|2018-06-21|2021-09-11|大陸商北京字節跳動網絡技術有限公司|Unified constrains for the merge affine mode and the non-merge affine mode|
GB2589223A|2018-06-21|2021-05-26|Beijing Bytedance Network Tech Co Ltd|Component-dependent sub-block dividing|
US11070813B2|2018-06-29|2021-07-20|Intel Corporation|Global motion estimation and modeling for accurate global motion compensation for efficient video processing or coding|
CN110876282A|2018-07-02|2020-03-10|华为技术有限公司|Motion vector prediction method and related device|
US11051025B2|2018-07-13|2021-06-29|Tencent America LLC|Method and apparatus for video coding|
US10462488B1|2018-07-13|2019-10-29|Tencent America LLC|Method and apparatus for video coding|
WO2020050695A1|2018-09-06|2020-03-12|엘지전자 주식회사|Motion prediction-based image decoding method and apparatus using merge candidate list in image coding system|
GB2577318B|2018-09-21|2021-03-10|Canon Kk|Video coding and decoding|
KR20200034644A|2018-09-21|2020-03-31|주식회사 엑스리스|Method for encodign/decodign video signal and apparatus therefor|
TW202017385A|2018-09-23|2020-05-01|大陸商北京字節跳動網絡技術有限公司|Non-affine blocks predicted from affine motion|
WO2020065518A1|2018-09-24|2020-04-02|Beijing Bytedance Network Technology Co., Ltd.|Bi-prediction with weights in video coding and decoding|
WO2020073928A1|2018-10-09|2020-04-16|Huawei Technologies Co., Ltd.|Inter prediction method and apparatus|
US11212521B2|2018-11-07|2021-12-28|Avago Technologies International Sales Pte. Limited|Control of memory bandwidth consumption of affine mode in versatile video coding|
WO2020098752A1|2018-11-14|2020-05-22|Beijing Bytedance Network Technology Co., Ltd.|Improvements of affine prediction mode|
US10880354B2|2018-11-28|2020-12-29|Netflix, Inc.|Techniques for encoding a media title while constraining quality variations|
US11102476B2|2018-12-28|2021-08-24|Qualcomm Incorporated|Subblock based affine motion model|
US11202089B2|2019-01-28|2021-12-14|Tencent America LLC|Method and apparatus for determining an inherited affine parameter from an affine model|
AU2020237079A1|2019-03-12|2021-09-23|Tencent America LLC|Method and apparatus for video encoding or decoding|
KR20220002989A|2019-04-25|2022-01-07|오피 솔루션즈, 엘엘씨|Signaling of global motion vectors in picture headers|
US11240499B2|2019-05-24|2022-02-01|Tencent America LLC|Method and apparatus for video coding|
WO2020251325A1|2019-06-14|2020-12-17|현대자동차주식회사|Method and device for coding and decoding video using inter-prediction|
KR20200143296A|2019-06-14|2020-12-23|현대자동차주식회사|Method and Apparatus for Encoding and Decoding Video by Using Inter Prediction|
US20210044833A1|2019-08-08|2021-02-11|FG Innovation Company Limited|Device and method for coding video data|
CN110636301B|2019-09-18|2021-08-03|浙江大华技术股份有限公司|Affine prediction method, computer device, and computer-readable storage medium|
CN113596463A|2019-09-23|2021-11-02|杭州海康威视数字技术股份有限公司|Encoding and decoding method, device and equipment|
WO2021086153A1|2019-10-31|2021-05-06|삼성전자 주식회사|Video decoding method and apparatus, and video encoding method and apparatus for performing inter prediction according to affine model|
法律状态:
2021-06-22| B11A| Dismissal acc. art.33 of ipl - examination not requested within 36 months of filing|
2021-09-08| B11Y| Definitive dismissal - extension of time limit for request of examination expired [chapter 11.1.1 patent gazette]|
2021-10-19| B350| Update of information on the portal [chapter 15.35 patent gazette]|
优先权:
申请号 | 申请日 | 专利标题
US201762471099P| true| 2017-03-14|2017-03-14|
US15/918,789|US10701390B2|2017-03-14|2018-03-12|Affine motion information derivation|
PCT/US2018/022129|WO2018169923A1|2017-03-14|2018-03-13|Affine motion information derivation|
[返回顶部]